Well, I've found a bunch of problems. I'm not sure exactly which one
Kachun is hitting, but I think it's likely he's hitting one of them.
* I removed --page_shortage in the 'pageout daemon flushing dirty page'
case. I shouldn't have. This can cause the pageout daemon to
free up way too many clean pages.
* The pageout daemon skips pages for vnodes it can't lock. BIG mistake.
This results in completely non-optimal paging operation.
It turns out that if the pageout daemon is woken up from a vm_fault,
which is quite common, it is highly likely that the vm_fault will be
holding a vnode lock and be in the middle of an I/O when the pageout
daemon runs, causing the pageout daemon to ignore the vnode the vm_fault
is sitting on. If you have a lot of processes doing I/O, a lot of
vnodes get ignored.
The lock-skipping code was originally in to prevent the pageout
daemon from deadlocking in a low-memory situation, and to prevent it
from locking up on dead NFS nodes. However, with the low-memory
deadlock fixes I recently committed, I think we may be able to
safely lock the vnode in the pageout daemon now.
* The pageout daemon reorders pages it had to 'skip'. The main culprit
is when it decides it can't lock the vnode. The reordering for this
case only occurs for dirty pages which results in fragmentation of the
queue ordering. Additionally it gives dirty pages 'triple priority'...
they get moved to the end of the inactive queue, and they also get
moved to the end of the inactive queue when they are successfully
cleaned. This causes originally dirty pages to stick around much
I'm not certain why Kachun isn't having a problem with 4.1.1, because most
of these problems are at least a year old. But I can see how recent
low-memory handling changes might have exasperated the existing problems.
The --page_shortage issue in particular really hoses the inactive scan
when maxlaunder is not sufficient to clean the dirty pages.
To give you an idea on the difference in performance, running a program
on my test box to iterate through a huge (3xMain-memory) file via mmap,
alternately touching 8K and accessing 8K, resulted in long system stalls
and a pidly pageout rate of maybe 2MB/sec. To disk.
When I made the pageout daemon block in the vnode lock rather then skip
the vnode, fixed --page_shortage, got rid of the inactive queue
reordering, and got rid of the artificial maxlaunder limitation,
the paging rate went up to 24 MB/sec and the system no longer stalled.
No stalling whatsoever. The system ran like clockwork despite having
no free memory.
I am going to make patch sets available for -current and -stable later
tonight for testing. The changes are straightforward but serious so
it could be up to two weeks before they get into -stable. I intend to
commit them to -current later this week, maybe thursday. Some serious
review is needed to ensure that the vnode locking change in the
pagedaemon does not screw it up when you have things like dead NFS nodes
The patches actually remove a whole lot of code.. the result is smaller
then the original :-). that's always nice!
I found another serious issue related to the update daemon's
synchronization when combined with pages dirtied via mmap() (when mmap
is used normally, without MAP_NOSYNC). We really need the incremental
syncing feature but unless someone else wants to do it it may be a while
before I can get to it.
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message