Re: freebsd-update's install_verify routine excessive stating

[ Available lists | Index of freebsd-hackers | Month of Jan 2009 | Week of 23 Jan 2009 | Raw email | View thread | Wrap long lines | Reply | Tag ]
From
Yoshihiro Ota <ota@j.email.ne.jp>
Date
23 Jan 2009 01:57:30
Subject
Re: freebsd-update's install_verify routine excessive stating
Message-ID
20090122203819.585fb35f.ota@j.email.ne.jp

In reply to
Replies
Referenced by

[ Hide this part ]
Hi.

It's interesting.

On Thu, 22 Jan 2009 13:17:41 +0100 (CET)
Oliver Fromme <olli@lurza.secnetix.de> wrote:

> Hi,
>
>
> So I would suggest to replace the whole pipe with this:
>
> awk -F "|" '$2 ~ /^f/ {print $2}' "$@" |
> sort -u > filelist
>
> It would be much better to generate two lists:
> - The list of hashes, as already done ("filelist")
> - A list of gzipped files present, stripped to the hash:
>
> (cd files; echo *.gz) |
> tr ' ' '\n' |
> sed 's/\.gz$//' > filespresent
>
> Note we use "echo" instead of "ls", in order to avoid the
> kern.argmax limit. 64000 files would certainly exceed that
> limit. Also note that the output is already sorted because
> the shell sorts wildcard expansions.
>
> Now that we have those two files, we can use comm(1) to
> find out whether there are any hashes in filelist that are
> not in filespresent:
>
> if [ -n "$(comm -23 filelist filespresent)" ]; then
> echo -n "Update files missing -- "
> ...
> fi
>
> That solution scales much better because no shell loop is
> required at all.

This will probably be the fastest.

awk -F "|" '
$2 ~ /^f/{required[$7]=$7; count++}
END{FS="[/.]";
while("find files -name *.gz" | getline>0)
if($2 in required)
if(--count<=0)
exit(0);
exit(count)}' "$@"

It verifies entries using hashtable instead of sort.
Therefore, it is O(n+m) instead of O(n*log(n))+O(m*log(m)) in theory.
I am not well aware how good awk's associate array is, though.

Regards,
Hiro

Elapsed time: 0.178 seconds