Moving files via shell script

Carl Lowenstein carl.lowenstein at gmail.com
Tue Mar 7 17:43:11 PST 2006


On 3/7/06, Carl Lowenstein <carl.lowenstein at gmail.com> wrote:
> On 3/7/06, Michael O'Keefe <mokeefe at qualcomm.com> wrote:
> > > Does not xargs(1) accept the output from find(1) as it arrives, and
> > > ship it off to grep(1) in suitable buffersful without waiting for
> > > find(1) to finish?
> >
> > No, xargs takes from stdin a list and builds the argv[] for it's command.
> > So worst case scenario it will wait for 1023 "lines" of input (without
> > the minus-ell flag) before doing it's fork()
> > This can of course take a LONG time to accumulate if you are walking
> > many NFS mounts. But if you use a suitable minus-ell flag, it will still
> > be faster than -exec
>
> Are we not still saying the same thing -- xargs does not wait for
> stdin to finish (from the previous process) but does wait for some
> amount of input to accumulate.
> "worst case 1023 lines" must be implementation-dependent.  My memory
> of xargs dates back to when it was a lot less than that.
> In any case it can be changed by the --max-linex=<lines> or -l<liines>
> or -L<lines> switch.

Interesting experimetal evidence, since my previous experiment was
still lying around in another window.
[cdl at delta include]$ time find . -type f -print | xargs -L477 grep -i
largefile64 /dev/null > /tmp/find_largefile64

real    0m0.229s
user    0m0.067s
sys     0m0.185s
[cdl at delta include]$ time find . -type f -print | xargs -L478 grep -i
largefile64 /dev/null > /tmp/find_largefile64
xargs: argument list too long

real    0m0.196s
user    0m0.054s
sys     0m0.107s
[cdl at delta include]$

So the default value is not hard-wired to 1023.  Presumably without a
specified number of lines, xargs maxes out on some other property of
its input.  Food for thought:

[cdl at delta include]$ find . -type f -print | head -477 | wc -c
13095
[cdl at delta include]$ find . -type f -print | head -478 | wc -c
13114

More food for thought.  What size of argument lists are produced by
xargs in this case?  A fair amount of fumbling around results in the
following command:

[cdl at delta include]$ find . -type f | xargs | \
 gawk '{printf ("%d ", length); gsub (/[^ ]/,""); print length}'
22286 938
22293 878
22294 1021
22298 749
22281 569
22277 517
22278 561
22307 581
22291 660
16836 597
[cdl at delta include]$

Explanation:  use xargs to split up a long list into acceptable pieces
(10 of them in this case).
Count number of characters in each piece, and number of spaces. 
Number of spaces is within 1 of the number of original find(1) output
lines that are concatenated by xargs.

Roughly speaking, xargs(1) seems to produce an argv() list of just
over 22000 characters, aggregating 500 to 900 of its inputs to do
this.  Again YMMV.

    carl
--
    carl lowenstein         marine physical lab     u.c. san diego
                                                 clowenst at ucsd.edu



More information about the KPLUG-List mailing list