Hoping someone woll be able to point me in the right direction. I am new to bash scripting and I believe awk should be able to solve this problem.
I have multiple files that I want to process, the data located in $1 will always stay the same, the separator is just a space and numbers in $2 will change.
I wish to sum $2 from the multiple files and output to a new file. Example below:
File1.txt
DATA:TEST0 20
DATA:TEST1 4
DATA:TEST2 39
DATA:TEST3 11
File2.txt
DATA:TEST0 2
DATA:TEST1 0
DATA:TEST2 26
DATA:TEST3 9
File3.txt
DATA:TEST0 44
DATA:TEST1 16
DATA:TEST2 21
DATA:TEST3 7
Output.txt is the output I wish to achieve from the above files
DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27
I have tried the following but it does not work
paste file* | awk '{$2=$1 $2}1' | tee output.txt
Any advice would be appreicated. Thanks in advance
CodePudding user response:
paste
puts the files side by side, you don't need that. Just give all the filenames as arguments to awk
and it will process them sequentially.
Use an associative array for the sums for each keyword in column 1.
awk '{sum[$1] = $2} END {for (i in sum) print i, sum[i]}' file* | tee output.txt
To keep the original order in the files, you can go back to using paste
. Then you have to loop every other column, adding to a sum variable.
paste file* | awk '{sum=0; for (i = 2; i <= NF; i =2) sum = $i; print($1, sum)' | tee output.txt
CodePudding user response:
Using gnu awk
:
awk '{sums[$1] = $2} END {PROCINFO["sorted_in"] = "@ind_num_asc";
for (i in sums) print i, sums[i]}' File{1..3}.txt
DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27
Note that PROCINFO["sorted_in"] = "@ind_num_asc"
is used merely for sorting the output in numerical order of the array index i.e. $1
. If that's not desired then following awk
would work for any awk version:
awk '{sums[$1] =$2} END {for (i in sums) print i,sums[i]}' File{1..3}.txt
To maintain original order use this awk
solution:
awk '!($1 in sums) {seq[ n] = $1} {sums[$1] = $2; }
END {for (i=1; i<=n; i) print seq[i], sums[seq[i]]}' File{1..3}.txt
DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27
CodePudding user response:
awk '
{a[$1] =$2}
END{
asorti(a,b); for(i in b) print b[i],a[b[i]]
}
' File[123].txt
DATA:TEST0 66
DATA:TEST1 20
DATA:TEST2 86
DATA:TEST3 27