Home > OS >  bash - Concatenate files in different subfolders into a single file and have each file name in the f
bash - Concatenate files in different subfolders into a single file and have each file name in the f

Time:12-13

I am trying to concatenate a few thousand files that are in different subfolders into a single file and also have the name of each concatenated file inserted as the first column so that I know which file each data row came from. Essentially starting with something like this:

Folder1
file1.txt
123 010 ...
456 020 ...
789 030 ...

Folder2
file2.txt 
abc 100 ...
efg 200 ...
hij 300 ...

and outputting this:

CombinedFile.txt
file1  123  010 ...
file1  456  020 ...
file1  789  030 ...
file2  abc  100 ...
file2  efg  200 ...
file2  hij  300 ...

After reading this post, I have tried the following code, but end up with a syntax error (apologies, I'm super new to awk!)

shopt -s globstar
for filename in path/**/*.txt; do
    awk '{print FILENAME "\t" $0}' *.txt > CombinedFile.txt
done

Thanks for your help!

CodePudding user response:

This single awk should be able to do it without any looping:

shopt -s globstar
awk 'FNR == 1 {f = FILENAME; gsub(/^.*\/|\.[^.] $/, "", f)}
   {print f, $0}' path/**/*.txt > CombinedFile.txt

cat CombinedFile.txt
file1 123 010
file1 456 020
file1 789 030
file2 abc 100
file2 efg 200
file2 hij 300

CodePudding user response:

Let's build the command step by step.

awk works with pattern-action pairs of the form pattern { action } which executes action on the current record/line if pattern is true. If pattern is omitted, it is assumed to be true, and if action is committed it is equivalent to print the current record.

As the OP wants to print the name of the file at the beginning of the file, we can use the internal variables FILENAME and FNR. FILENAME contains the name of the file and FNR contains the current record/line number of the file being processed. So if FNR == 1 we want to print the filename. In awk, you write this as (FNR == 1){print FILENAME} When this condition is checked, we just need to print the line. This is done by 1 { print $0 } which is equivalent to 1.

So the following line prints what is expected for a single file:

$ awk '(FNR==1){print FILENAME}1' file

But we want to do this for multiple files, so we can do:

$ awk '(FNR==1){print FILENAME}1' file1 file2 file3 ... filen

or using a pattern/glob

$ awk '(FNR==1){print FILENAME}1' *.txt

If you want to match all files in the subdirectories as well, it can easily be done using find:

$ find . -type f -iname '*txt' -exec awk '(FNR==1){print FILENAME}1' {} \;  

The output of any of these files can now be redirected to any target upon request.

  • Related