I am convinced that there is a bug in the CAM code that leads to I/O starvation.
I have already discussed this privately with some. I am now bringing this up to
the general audience to get more feedback.
My setup is that I have 1 RAID controller with 2 arrays connected to
it, da0 and da1.
The controller supports 252 tags. After boot up, camcontrol tags on
da0 and da1 shows that both devices have 252 openings each. A process
P0 writing on da0 is dormant most of the time, but would wake up with
burst of I/Os, 5000-6000 ops as reported by gstat.
A process P1 writing on da1 has a fixed data rate to da1 as reported by gstat.
The issue: When P0 generates that burst of 5000-6000 ops, the write
rate of P1 on da1 goes to 0 MB/sec for up to 8-9sec,
vfs.hirunningspace starts climbing and we get into waithirunning() or
getblk() sleep channel. BTW, raising hirunningspace has no effect on
the 0 MB/s behavior.
The first problem that I see here, is that if the sim's devq has 252
send_queue, the struct cam_ed representing da0 and da1 should each
have 126 openings and not
252. The second problem is that clearly, there is no I/O fairness in CAM as seen
in gstat output and da0 exclusively takes a hold of the sim/controller
until it has processed all it's I/Os (8-9 seconds). The code that does
this is at
3030 && (devq->alloc_openings > 0)
3091 && (devq->send_openings > 0)
After you've split the openings to 126 each, the tests above will always be true
I have a patch and it fixes those problems. I can share it to the list
if requested to.
da0 and da1 now both automatically get 126 openings and based on that,
extra logic implements fairness in cam/cam_xpt.c. No more 0 MB/s on
da1. This is on 8.1-RELEASE FreeBSD.
Any comments welcome.