I have this file (space delimited) :
bc1 no 12
bc1 no 15
bc1 yes 4
bc2 no 8
bc3 yes 14
bc3 yes 12
bc4 no 2
I would like to get this output :
bc1 3 no;no;yes 31
bc2 1 no 8
bc3 2 yes;yes 26
bc4 1 no 2
1st column : one occurence of the first column in the input file
2nd : number of this occurence in the input file
3rd : 3rd column translated in row with ";" delimiter
4th : sum of the last column
I can do what I want with the "no/yes" column :
awk -F' ' 'NF>2{a[$1] = a[$1]";"$2}END{for(i in a){print i" "a[i]}}' test.txt | sort -k1,1n
CodePudding user response:
One alternative awk
idea:
awk '
function print_row() {
if (series)
print key,c,series,sum
c=sum=0
series=sep=""
}
{ if ($1 != key) # if 1st column has changed then print previous data set
print_row()
key=$1
c
series=series sep $2
sep=";"
sum =$3
}
END { print_row() } # flush last data set to stdout
' input
This generates:
bc1 3 no;no;yes 31
bc2 1 no 8
bc3 2 yes;yes 26
bc4 1 no 2
CodePudding user response:
With your shown samples, please try following awk
code. Since $1(first column) is always sorted we need not to sort it so coming up with this solution here.
awk '
prev!=$1 && prev{
print prev OFS count,value,sum
count=sum=0
prev=value=""
}
{
prev=$1
value=(value?value ";":"") $2
count
sum =$NF
}
END{
if(prev){
print prev OFS count,value,sum
}
}
' Input_file
CodePudding user response:
Here's a solution with newer versions of datamash (thanks glenn jackman for the tips about count
and --collapse-delimiter
):
datamash -t' ' -g1 --collapse-delimiter=';' count 2 collapse 2 sum 3 <ip.txt
With older versions (mine is 1.4
) and awk
:
$ datamash -t' ' -g1 count 2 collapse 2 sum 3 <ip.txt
bc1 3 no,no,yes 31
bc2 1 no 8
bc3 2 yes,yes 26
bc4 1 no 2
$ <ip.txt datamash -t' ' -g1 count 2 collapse 2 sum 3 | awk '{gsub(/,/, ";", $3)} 1'
bc1 3 no;no;yes 31
bc2 1 no 8
bc3 2 yes;yes 26
bc4 1 no 2
-t
helps to set space as field separator. Column 2 is collapsed by using column 1 as the key. sum 3
helps to find the total of the numbers. count 2
helps to count the number of collapsed rows.
Then, awk
is used to change the ,
in the third column to ;