On 24 Sep, Tor Egge wrote:
>> It appears that there is some sort of interaction between soft
>> updates and background fsck that results in the link count of the
>> parent of one of these directories being double decremented,
>> resulting in the file system being put into an invalid state.
> If the snapshot for background fsck is taken on a file system which
> has pending softupdate dependencies then this can happen. For this
> particular case, the system had a pending dirrem dependency.
In this particular case, the dependency seems to remain pending forever.
I executed sync(8), waited for 60+ seconds, and manually typed in a
couple of commands before running fsck. Only the the file system
modifications done by fsck -B or unmounting the file system seem to
flush the parent directory link count update to disk.
>> The following transcript demonstrates what happens if a background
>> fsck is run after the leaf directory is removed. What is interesting
>> is that after the directory the leaf directory has been removed, the
>> effective link count of the parent directory (displayed by ls) has
>> been decremented from 3 to 2, whereas the on-disk link count shown by
>> fsdb is still 3. The background fsck appears to detect the link
>> count as 3, and executes the sysctl call to decrement the link count,
>> causing both the effective and actual link counts to be decremented
>> to 1.
>> My suspicion is that the physical update of the parent directory's
>> link count after the rmdir of the leaf directory has been deferred
>> until the leaf directory's inode is zeroed, which turns out to be an
>> indefinite wait because the inode doesn't get zeroed until fsck is
> ufs_rmdir() calls ufs_dirremove() after having lowered i_effnlink in
> memory for both leaf and parent directory.
> ufs_dirremmove() calls softdep_setup_remove() which sets up the
> softupdates dependencies for reducing di_nlink on disk for leaf and
> parent directory when it's safe to do so (i.e. after the directory
> entry referencing the leaf directory has been cleared on disk). See
> code in reassignbuf() for various delays before the syncer process
> pushes the dirty buffers to disk.
Ok, now I see the code at the end of handle_workitem_remove() that
reuses the struct dirrem to decrement the link count of the parent
directory. For some reason the second handle_workitem_remove() call is
getting deferred indefinitely.
> The background fsck found the the di_nlink value being 3 on the parent
> directory and issued an FFS_ADJ_REFCNT sysctl to reduce it by one,
> having no knowledge about the pending dirrem dependency. See
> sysctl_ffs_fsck() for the handling of that sysctl.
> After background fsck has run and the dirrem dependency has been
> processed, the link counts for the parent directory are both 1.
Even without the indefinite deferral problem, it seems to me that
updating either file or directory link counts in background fsck is
hazardous unless the directory slot updates and link count updates can
be guaranteed to be consistent in the snapshot.
> The latest panic shown on
> <http://people.freebsd.org/~pho/stress/log/index.html>, "panic:
> handle_written_inodeblock: live inodedep" was probably caused by this
> issue. If the snapshot was taken while a directory or file was being
> removed then it might contain an unreferenced inode with a nonzero
> link count. The background fsck would reduce the link count for the
> inode, triggering freeing of the inode (c.f. ufs_inactive(),
> UFS_VFREE(), ffs_vfree() and softdep_freefile()). After writing the
> zeroed inode to disk the system would panic due to the still pending
> dirrem dependency.
My investigation of that particular problem led me to try this
experiment. I actually haven't been able to reproduce the
handle_written_inodeblock panic, but I've been able to reproduce the
deadlock problem a number of times.