Home > OS >  Adding line number and file name to 3 column files using awk
Adding line number and file name to 3 column files using awk

Time:06-22

I have a couple of data sets containg x, y, z coordinates in plain text format (no column headers, whitespace delimited, crlf line breaks). Data looks like this:

10168522 21059480 -86
10169988 21058886 -86
10171457 21058291 -86
10172926 21057706 -86
10174428 21057114 -85
10175927 21056531 -85
10177434 21055952 -85
10178966 21055370 -84
10180473 21054773 -85
10181992 21054164 -85
10183517 21053557 -85

In this example, the filename is "fileA.xyz". I'd like to add the line number and the filename to the file and print it out using awk (actually gawk 4.0.2 with no option of upgrading / installing additional tools). I have come up with the following

awk -F ' ' -v OFS=' ' -v ORS=' ' '{$1; $2; $3; $(NF 1)= i FS "fileA.xyz"}1' better_fileA.xyz

which kind of works, but leaves the first line untouched:

10168522 21059480 -86
 1 fileA.xyz 10169988 21058886 -86
 2 fileA.xyz 10171457 21058291 -86
 3 fileA.xyz 10172926 21057706 -86
 4 fileA.xyz 10174428 21057114 -85
 5 fileA.xyz 10175927 21056531 -85
 6 fileA.xyz 10177434 21055952 -85
 7 fileA.xyz 10178966 21055370 -84
 8 fileA.xyz 10180473 21054773 -85
 9 fileA.xyz 10181992 21054164 -85
 10 fileA.xyz 10183517 21053557 -85

I've also noticed an extra white space in front of the line number (first column). I do understand awk is very complex, but I am a bit lost amidst the syntax options. For starters, I'm wondering why the order of columns I provided is apparently not passed on to the output?

Since all files are very large (couple of gigs), I'd like to use awk / sed. Also note that files need to be consistent, hence solutions involving cat -n are not really an option (due to the padding of line numbers in the file, which is not reasonable in this scenario where the range of line numbers is not known a priori and also because this would not take care of the filename).

Any suggestions or pointers towards a solution would be very welcome!

CodePudding user response:

If the filename is hard-coded, this short one-liner should help:

$ awk '$0=NR FS "fileA.xyz" FS $0' YourFile

Basically, it prepends the stuff (lineNo# and the filename) to each record, and prints to stdOut.

CodePudding user response:

$ awk '{printf "= %s %s\n", NR, "fileA.xyz", $0}' better_fileA.xyz 
  1 fileA.xyz 10168522 21059480 -86
  2 fileA.xyz 10169988 21058886 -86
  3 fileA.xyz 10171457 21058291 -86
  4 fileA.xyz 10172926 21057706 -86
  5 fileA.xyz 10174428 21057114 -85
  6 fileA.xyz 10175927 21056531 -85
  7 fileA.xyz 10177434 21055952 -85
  8 fileA.xyz 10178966 21055370 -84
  9 fileA.xyz 10180473 21054773 -85
 10 fileA.xyz 10181992 21054164 -85
 11 fileA.xyz 10183517 21053557 -85
  • Related