I have several tab delimited files that look like this ( I am presenting two examples).
sim9_more_stuff.out:
0 31492.4941084098 599.505895270519
1 32091.999959 4.1727117e-05
...
sim999_more_stuff.out:
0 23455 5.05895270519
1 3959 477
...
I would need to concatenate all these files, but keeping track of the identifier of the file in a column, like this:
0 31492.4941084098 599.505895270519 sim9
1 32091.999959 4.1727117e-05 sim9
...
0 23455 5.05895270519 sim999
1 3959 477 sim999
I thought that I could use something like the following loop and, after that, the cat command:
for f in file1 file2 file3; do sed -i "s/$/\t$f/" $f; done
But doing that would write the complete filenames, and I only want the identifier.
Could you propose a more accurate and automatized way of doing this? Thanks a lot for your time and sorry for the naive question.
CodePudding user response:
You may use an awk
solution like this:
awk '{split(FILENAME, a, /_/); print $0 "\t" a[1]}' *.out
0 23455 5.05895270519 sim999
1 3959 477 sim999
0 31492.4941084098 599.505895270519 sim9
1 32091.999959 4.1727117e-05 sim9
CodePudding user response:
Assumptions:
- by
identifier
OP means everything the comes before the first_
- all filenames have at least one
_
Using parameter expansion to obtain the identifier
, eg:
$ f='sim999_more_stuff.out'
$ echo "${f%%_*}"
sim999
Tweaking OP's current code:
for f in *stuff.out; do sed -i "s/$/\t${f%%_*}/" $f; done