Arun Sharma writes:
> The daemons which are involved in freeing up pages during low memory
> conditions qualify as system daemons. Making sure that these daemons
> don't block avoids the deadlock.
The second solution involves a little more than that. Such as
blessing "normal" jobs just enough to allow them to get sufficent
resources to avoid a deadlock.
One instance of the mmap lockup involves a case where you've got a
single process dirtying a memory mapped file which is larger than
physical memory. Assuming an otherwise idle system, nearly all
available memory in the system will belong to the file's object & it
will all be dirty.
At some point, the process will trigger a fault on a non-resident
page. vm_fault will call the vnode_pager_getpages to read in the
faulting page. ffs_getpages (let's assume we're using ffs)
will then call ffs_read to read in the pages. ffs_read will try to
build a cluster. The deadlock occurs when allocbuf cannot allocate a
page for one of the pages in the cluster. Here's a stack trace (from
a long, long time ago, May 12th):
vm_page_alloc(caa0a074,d1d,0,c58f7ba0,1fc) at vm_page_alloc
allocbuf(c58f7ba0,2000,0,c58c4588,5) at allocbuf+0x3ae
getblk(caa0f8c0,68e,2000,0,0) at getblk+0x32e
cluster_rbuild(caa0f8c0,8000001,0,689,370b0) at cluster_rbuild+0x1df
cluster_read(caa0f8c0,8000001,0,689,2000) at cluster_read+0x2cc
ffs_read(caa12e28) at ffs_read+0x3ea
ffs_getpages(caa12e80) at ffs_getpages+0x22c
vnode_pager_getpages(caa0a074,caa12f14,1,0,c9fcdce0) at vnode_pager_getpages+0x4e
vm_fault(c9fd28c0,48df9000,3,8,c9fcdce0) at vm_fault+0x484
trap_pfault(caa12fb8,1,48df9000) at trap_pfault+0xaa
trap(2f,2f,2f,48df9000,48df9000) at trap+0x1aa
calltrap() at calltrap+0x1c
The real problem is that the pageout daemon cannot push any pages
because (nearly) all the pages available to user-processes are held by
the mmap'ed object. The killer is that they are all dirty & that
because we're in the middle of doing a cluster read, the vnode is
locked so the pageout daemon cannot touch them.
A solution would be allowing the faulting process to dip into the
system reserves enough so that the vm_page_alloc will succeed, which
will allow the cluster read to complete. This will avoid deadlock.
I personally think the first solution (always taking write faults)
would be far, far better. This would allow the system to avoid
getting anywhere near a deadlock situation & to remain responsive.
I'm afraid that if we go with the second solution, the system would be
unresponsive until the cluster read completed & the pageout daemon was
able begin to flush the dirty pages in the offending object.
Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
Duke University Email: email@example.com
Department of Computer Science Phone: (919) 660-6590
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message