Home > database >  Bash nested pipes and wildcards
Bash nested pipes and wildcards

Time:08-18

I have input files foo1.txt, foo2.txt, foo3.txt, etc. I have some command munge that processes the input files, but (for reasons that are not relevant here) the command can only process one input file at a time. I want to combine the output into a single out.txt.

I know I can do cat foo*.txt to concatenate all the input files, but as mentioned munge can only work on each separate file. That is, munge will not like it if I do cat foo*.txt | munge > out.txt. Instead I need to perform the processing on each file before the outputs are concatenated.

I'm sure I could loop over the input files using for, but then how could I combine the output?

Basically I'm looking for something like the equivalent of this, without enumerating all the input files beforehand.

cat foo1.txt | munge > out1.txt
cat foo2.txt | munge > out2.txt
cat foo2.txt | munge > out2.txt
cat out*.txt > out.txt

I'll bet there is some extremely simple command that can do this for me in a single line, perhaps with nested piping and wildcards. Any ideas?

CodePudding user response:

Use a loop and redirect the output of the whole loop to out.txt. And there's no need to pipe from cat, you can simply redirect input to the file.

for file in foo*.txt; do
    munge < "$file"
done > out.txt

CodePudding user response:

Firstly, you have useless uses of cat (UUoC):

cat foo1.txt | munge > out1.txt
cat foo2.txt | munge > out2.txt
cat foo3.txt | munge > out3.txt  # I assume you wanted 3 here
cat out*.txt > out.txt

is done more simply as:

< foo1.txt munge > out1.txt
< foo2.txt munge > out2.txt
< foo3.txt munge > out3.txt
cat out*.txt > out.txt

More usually, all redirections appear after the command, but this is not required:

munge < foo1.txt > out1.txt
munge < foo2.txt > out2.txt
munge < foo3.txt > out3.txt
cat out*.txt > out.txt

In Bash, process substitution could be used to combine these together, like this, but that would be overcomplicated when the final combinator is just catenation:

# useless use of process substitution plus cat (uuopspcat)
cat <(munge < foo1.txt) <(munge < foo2.txt) <(munge < foo3.txt) > out.txt

This can be done instead: run the programs in a subshell via the parentheses operator, and redirect the output of that subshell into the combined file.

(munge < foo1.txt ; munge < foo2.txt ; munge < foo3.txt) > out.txt

If munge is a "linear operator under catenation", loosely speaking, then it should be possible to do this:

cat foo1.txt foo2.txt foo3.txt | munge > out.txt

For instance if munge is something like grep foo, then this transformation is valid. Catenating the grep outputs is the same as grepping the catenated inputs.

If munge can be extended to take multiple filename arguments and iterate over them, then it can just be:

munge foo1.txt foo2.txt foo3.txt > out.txt

CodePudding user response:

Assuming munge can take a file name argument:

printf '%s\0' foo*.txt | xargs -0 -n 1 munge > out.txt
  • Related