On Tuesday 12 August 2008 01:23:47 pm Bruce Evans wrote:
> On Tue, 12 Aug 2008, John Baldwin wrote:
> > Of course bpf is
> > broken with revoke, but nobody uses revoke with bpf. What people do do in
> > the normal course of using bpf is lots of concurrent bpf accesses, and w/o
> > D_TRACKCLOSE, bpf devices don't get closed.
> Why not fix the actual bug?
> Your commit doesn't give enough details on the actual bug, so I will
> try to guess it: libpcap has to probe for for a free bpf unit, so it
> does lots of failing opens of bpfN (especially when N == 0) when bpfN
> is already in use. Failing opens break last closes with which they
> are concurrent, because the relevant reference count (si_usecount) is
> increased during the failing open (I think it is the vref() in _fgetvp()
> that does it). Then when the opens fail, si_usecount is decremented
> to 1, but devfs_close() is not called again because only 1 real last
> close is possible (I think -- at least without races with revoke()),
> so d_close() is never called twice for 1 real least close. Failing
> opens shouldn't take long, so it is surprising that the race is often
> lost. Apparently there is some synchronization.
Correct-ish. The actual extra reference is in devfs_lookup() rather than
_fgetvp(). Specifically, it is an open concurrent with a close. The opening
thread first does the lookup which bumps the reference count in
devfs_allocv() (IIRC). Eventually we get to devfs_open() which drops the
vnode lock while it invokes d_open(). If the closing thread is waiting for
the vnode lock for close, then it can acquire the lock and do devfs_close().
This sees count_dev() > 1 and doesn't call d_close(). Meanwhile, the opening
thread will fail bpfopen() and return, but the bpf device is permamently
open. When I talked with phk@ about it originally his reply to my various
suggestions was D_TRACKCLOSE. I'm not sure how you'd really fix this
otherwise: calling d_close() from devfs_open() if the use_count is 1 after
re-acquiring the vnode lock (you have to drop the vnode lock while you call
d_close() though, so you might still race with other concurrent opens for
example), having a count of pending opens and subtracting that from
count_dev() when checking for whether or not to call d_close() (but is that
dubious? Then you might call d_close() while d_open() is running, so the
driver would need to put locking to synchronize the two and handle the case
that the d_close() actually runs after a d_open() that was successfull, so
each driver now has to duplicate that work.), etc.