Translate specific columns to rows-CodePudding

I have this file (space delimited) :

bc1 no 12
bc1 no 15
bc1 yes 4
bc2 no 8  
bc3 yes 14 
bc3 yes 12
bc4 no 2

I would like to get this output :

bc1 3 no;no;yes 31
bc2 1 no 8
bc3 2 yes;yes 26
bc4 1 no 2

1st column : one occurence of the first column in the input file

2nd : number of this occurence in the input file

3rd : 3rd column translated in row with ";" delimiter

4th : sum of the last column

I can do what I want with the "no/yes" column :

awk -F' ' 'NF>2{a[$1] = a[$1]";"$2}END{for(i in a){print i" "a[i]}}' test.txt | sort -k1,1n

CodePudding user response：

One alternative awk idea:

awk '
function print_row() {
    if (series)
       print key,c,series,sum
    c=sum=0
    series=sep=""
}
    { if ($1 != key)                # if 1st column has changed then print previous data set
         print_row()
      key=$1
      c  
      series=series sep $2
      sep=";"
      sum =$3
    }
END { print_row() }                 # flush last data set to stdout
' input

This generates:

bc1 3 no;no;yes 31
bc2 1 no 8
bc3 2 yes;yes 26
bc4 1 no 2

CodePudding user response：

With your shown samples, please try following awk code. Since $1(first column) is always sorted we need not to sort it so coming up with this solution here.

awk '
prev!=$1 && prev{
  print prev OFS count,value,sum
  count=sum=0
  prev=value=""
}
{
  prev=$1
  value=(value?value ";":"") $2
  count  
  sum =$NF
}
END{
  if(prev){
     print prev OFS count,value,sum
  }
}
'  Input_file

CodePudding user response：

Here's a solution with newer versions of datamash (thanks glenn jackman for the tips about count and --collapse-delimiter):

datamash -t' ' -g1 --collapse-delimiter=';' count 2 collapse 2 sum 3 <ip.txt

With older versions (mine is 1.4) and awk:

$ datamash -t' ' -g1 count 2 collapse 2 sum 3 <ip.txt
bc1 3 no,no,yes 31
bc2 1 no 8
bc3 2 yes,yes 26
bc4 1 no 2

$ <ip.txt datamash -t' ' -g1 count 2 collapse 2 sum 3 | awk '{gsub(/,/, ";", $3)} 1'
bc1 3 no;no;yes 31
bc2 1 no 8
bc3 2 yes;yes 26
bc4 1 no 2

-t helps to set space as field separator. Column 2 is collapsed by using column 1 as the key. sum 3 helps to find the total of the numbers. count 2 helps to count the number of collapsed rows.

Then, awk is used to change the , in the third column to ;