I am trying to concatenate a few thousand files that are in different subfolders into a single file and also have the name of each concatenated file inserted as the first column so that I know which file each data row came from. Essentially starting with something like this:
Folder1
file1.txt
123 010 ...
456 020 ...
789 030 ...
Folder2
file2.txt
abc 100 ...
efg 200 ...
hij 300 ...
and outputting this:
CombinedFile.txt
file1 123 010 ...
file1 456 020 ...
file1 789 030 ...
file2 abc 100 ...
file2 efg 200 ...
file2 hij 300 ...
After reading this post, I have tried the following code, but end up with a syntax error (apologies, I'm super new to awk!)
shopt -s globstar
for filename in path/**/*.txt; do
awk '{print FILENAME "\t" $0}' *.txt > CombinedFile.txt
done
Thanks for your help!
CodePudding user response:
This single awk
should be able to do it without any looping:
shopt -s globstar
awk 'FNR == 1 {f = FILENAME; gsub(/^.*\/|\.[^.] $/, "", f)}
{print f, $0}' path/**/*.txt > CombinedFile.txt
cat CombinedFile.txt
file1 123 010
file1 456 020
file1 789 030
file2 abc 100
file2 efg 200
file2 hij 300
CodePudding user response:
Let's build the command step by step.
awk
works with pattern-action pairs of the form pattern { action }
which executes action
on the current record/line if pattern
is true. If pattern
is omitted, it is assumed to be true, and if action is committed it is equivalent to print the current record.
As the OP wants to print the name of the file at the beginning of the file, we can use the internal variables FILENAME
and FNR
. FILENAME
contains the name of the file and FNR
contains the current record/line number of the file being processed. So if FNR == 1
we want to print the filename. In awk, you write this as (FNR == 1){print FILENAME}
When this condition is checked, we just need to print the line. This is done by 1 { print $0 }
which is equivalent to 1
.
So the following line prints what is expected for a single file:
$ awk '(FNR==1){print FILENAME}1' file
But we want to do this for multiple files, so we can do:
$ awk '(FNR==1){print FILENAME}1' file1 file2 file3 ... filen
or using a pattern/glob
$ awk '(FNR==1){print FILENAME}1' *.txt
If you want to match all files in the subdirectories as well, it can easily be done using find
:
$ find . -type f -iname '*txt' -exec awk '(FNR==1){print FILENAME}1' {} \;
The output of any of these files can now be redirected to any target upon request.