Home > OS >  How can I concatenate files, keeping track of the name of the file in a column?
How can I concatenate files, keeping track of the name of the file in a column?

Time:03-08

I have several tab delimited files that look like this ( I am presenting two examples).

sim9_more_stuff.out:

0       31492.4941084098        599.505895270519                
1       32091.999959    4.1727117e-05
...

sim999_more_stuff.out:

0       23455        5.05895270519                
1       3959    477
...

I would need to concatenate all these files, but keeping track of the identifier of the file in a column, like this:

0       31492.4941084098        599.505895270519   sim9             
1       32091.999959    4.1727117e-05              sim9    
...
0       23455        5.05895270519        sim999              
1       3959    477                       sim999

I thought that I could use something like the following loop and, after that, the cat command:

for f in file1 file2 file3; do sed -i "s/$/\t$f/" $f; done

But doing that would write the complete filenames, and I only want the identifier.

Could you propose a more accurate and automatized way of doing this? Thanks a lot for your time and sorry for the naive question.

CodePudding user response:

You may use an awk solution like this:

awk '{split(FILENAME, a, /_/); print $0 "\t" a[1]}' *.out

0       23455        5.05895270519  sim999
1       3959    477 sim999
0       31492.4941084098        599.505895270519    sim9
1       32091.999959    4.1727117e-05   sim9

CodePudding user response:

Assumptions:

  • by identifier OP means everything the comes before the first _
  • all filenames have at least one _

Using parameter expansion to obtain the identifier, eg:

$ f='sim999_more_stuff.out'
$ echo "${f%%_*}"
sim999

Tweaking OP's current code:

for f in *stuff.out; do sed -i "s/$/\t${f%%_*}/" $f; done
  •  Tags:  
  • bash
  • Related