Home > Enterprise >  Sum multiple files into 1 output file
Sum multiple files into 1 output file

Time:07-21

Hoping someone woll be able to point me in the right direction. I am new to bash scripting and I believe awk should be able to solve this problem.

I have multiple files that I want to process, the data located in $1 will always stay the same, the separator is just a space and numbers in $2 will change.

I wish to sum $2 from the multiple files and output to a new file. Example below:

File1.txt

DATA:TEST0 20
DATA:TEST1 4
DATA:TEST2 39
DATA:TEST3 11

File2.txt

DATA:TEST0 2
DATA:TEST1 0
DATA:TEST2 26
DATA:TEST3 9

File3.txt

DATA:TEST0 44
DATA:TEST1 16
DATA:TEST2 21
DATA:TEST3 7

Output.txt is the output I wish to achieve from the above files

DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27

I have tried the following but it does not work

paste file* | awk '{$2=$1 $2}1' | tee output.txt

Any advice would be appreicated. Thanks in advance

CodePudding user response:

paste puts the files side by side, you don't need that. Just give all the filenames as arguments to awk and it will process them sequentially.

Use an associative array for the sums for each keyword in column 1.

awk '{sum[$1]  = $2} END {for (i in sum) print i, sum[i]}' file* | tee output.txt

To keep the original order in the files, you can go back to using paste. Then you have to loop every other column, adding to a sum variable.

paste file* | awk '{sum=0; for (i = 2; i <= NF; i =2) sum  = $i; print($1, sum)' | tee output.txt

CodePudding user response:

Using gnu awk:

awk '{sums[$1]  = $2} END {PROCINFO["sorted_in"] = "@ind_num_asc";
for (i in sums) print i, sums[i]}' File{1..3}.txt

DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27

Note that PROCINFO["sorted_in"] = "@ind_num_asc" is used merely for sorting the output in numerical order of the array index i.e. $1. If that's not desired then following awk would work for any awk version:

awk '{sums[$1] =$2} END {for (i in sums) print i,sums[i]}' File{1..3}.txt

To maintain original order use this awk solution:

awk '!($1 in sums) {seq[  n] = $1} {sums[$1]  = $2; }
END {for (i=1; i<=n;   i) print seq[i], sums[seq[i]]}' File{1..3}.txt

DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27

CodePudding user response:

 awk '
     {a[$1] =$2}
     END{ 
        asorti(a,b); for(i in b) print b[i],a[b[i]]
     }
 ' File[123].txt

DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27
  • Related