I have a directory with files that looks like this:
CCG02-215-WGS.format.flt.txt
CCG05-707-WGS.format.flt.txt
CCG06-203-WGS.format.flt.txt
CCG04-967-WGS.format.flt.txt
CCG05-710-WGS.format.flt.txt
CCG06-215-WGS.format.flt.txt
Contents of each files look like this
1 9061390 14 93246140
1 58631131 2 31823410
1 108952511 3 110694548
1 168056494 19 23850376
etc...
Ideal output would be a file, let's call it all-samples.format.flt.txt, that would contain the concatenation of all files, but an additional column that displays which sample/file the row came from ( some minor formatting involved to remove the .format.flt.txt ):
1 9061390 14 93246140 CCG02-215-WGS
...
1 58631131 2 31823410 CCG05-707-WGS
...
1 108952511 3 110694548 CCG06-203-WGS
...
1 168056494 19 23850376 CCG04-967-WGS
Currently, I have the following code which works for individual files.
awk 'BEGIN{OFS="\t"; split(ARGV[1],f,".")}{print $1,$2,$3,$4,f[1]}' CCG05-707-WGS.format.flt.txt
#OUTPUT
1 58631131 2 31823410 CCG05-707-WGS
...
However, when I try to apply it to all files, using the star, it adds the first filename it finds to all the files as the 4th column.
awk 'BEGIN{OFS="\t"; split(ARGV[1],f,".")}{print $1,$2,$3,$4,f[1]}' *
#OUTPUT, 4th column should be as seen in previous code block
1 9061390 14 93246140 CCG02-215-WGS
...
1 58631131 2 31823410 CCG02-215-WGS
...
1 108952511 3 110694548 CCG02-215-WGS
...
1 168056494 19 23850376 CCG02-215-WGS
I feel like the solution may just lie in adding an additional parameter to awk... but I'm not sure where to start.
Thanks!
UPDATE
Using OOTB awk var FILENAME solved the issue, plus some elegant formatting logic for the file names.
Thank @RavinderSingh13!
awk 'BEGIN{OFS="\t"} FNR==1{file=FILENAME;sub(/..*/,"",file)} {print $0,file}' *.txt
CodePudding user response:
With your shown samples please try following awk
code. We need to use FILENAME
OOTB variable here of awk
. Then whenever there is first line of any txt file(all txt files passed to this program) then remove everything from .
to till last of value and in main program printing current line followed by file(file's name as per requirement)
awk '
BEGIN { OFS="\t" }
FNR==1{
file=FILENAME
sub(/\..*/,"",file)
}
{
print $0,file
}
' *.txt
OR in a one-liner form try following awk
code:
awk 'BEGIN{OFS="\t"} FNR==1{file=FILENAME;sub(/\..*/,"",file)} {print $0,file}' *.txt
CodePudding user response:
You may use:
Any version awk
:
awk -v OFS='\t' 'FNR==1{split(FILENAME, a, /\./)} {print $0, a[1]}' *.txt
Or in gnu-awk:
awk -v OFS='\t' 'BEGINFILE{split(FILENAME, a, /\./)} {print $0, a[1]}' *.txt