A way to transpose some columns of a file in Bash-CodePudding

I have a huge comma-separated file formatted like this

name,account,1/2022,2/2022,3/2022
row1,1234,0,1,2
row2,5678,3,4,5
row3,4321,6,7,8
row4,8765,9,10,11

I would like to transpose it in an efficient way using only bash commands (I have used Python Melt and loaded into a db and used unpivot function), but I think both are slower to execute than the native bash or awk solutions

So the output should look like

name,account,date,value
row1,1234,1/2022,0
row1,1234,2/2022,1
row1,1234,3/2022,2
row2,5678,1/2022,3
row2,5678,2/2022,4
row2,5678,3/2022,5
row3,4321,1/2022,6
row3,4321,2/2022,7
row3,4321,3/2022,8
row4,8765,1/2022,9
row4,8765,2/2022,10
row4,8765,3/2022,11

The expected result is in the millions and in Python I would have to chunk and loop the dataframe. Most SQL DB commands can do it, but the unpivot function seems to be single-threaded and thus slow.

Looking for creative solutions in AWK or something native in Ubuntu.

CodePudding user response：

You can try it,

awk 'BEGIN{OFS=FS=","}
    NR==1{print $1,$2,"date","value";
          for(i=0;i<NF-2;  i){date[i]=$(i 3);}
         }
    NR>1{for(i=0;i<NF-2;  i){print $1,$2,date[i],$(i 3)}}
' inputfile

you get,

name,account,date,value
row1,1234,1/2022,0
row1,1234,2/2022,1
row1,1234,3/2022,2
row2,5678,1/2022,3
row2,5678,2/2022,4
row2,5678,3/2022,5
row3,4321,1/2022,6
row3,4321,2/2022,7
row3,4321,3/2022,8
row4,8765,1/2022,9
row4,8765,2/2022,10
row4,8765,3/2022,11

CodePudding user response：

Another variation on Jose's approach which simply adjusts the indexes a bit to be consistent with iterating from 1 to NF could be:

awk -F, -v OFS="," '
  FNR == 1 {
    ndates = NF - 2
    for (i = ndates; i <= NF; i  )
      dates[i-2] = $i
    print $1, $2, "date,value"
    next
  }
  {
    for (i = ndates; i <= NF; i  ) 
      print $1, $2, dates[i-2], $i
  }
' file

This assumes the number of fields for each record are consistent, but will handle a variable number of fields from field no. 3 on.

Example Use/Output

Copying and middle-mouse pasting the above into an x-term in the directory where your input file is located would be:

$ awk -F, -v OFS="," '
>   FNR == 1 {
>     ndates = NF - 2
>     for (i = ndates; i <= NF; i  )
>       dates[i-2] = $i
>     print $1, $2, "date,value"
>     next
>   }
>   {
>     for (i = ndates; i <= NF; i  )
>       print $1, $2, dates[i-2], $i
>   }
> ' file
name,account,date,value
row1,1234,1/2022,0
row1,1234,2/2022,1
row1,1234,3/2022,2
row2,5678,1/2022,3
row2,5678,2/2022,4
row2,5678,3/2022,5
row3,4321,1/2022,6
row3,4321,2/2022,7
row3,4321,3/2022,8
row4,8765,1/2022,9
row4,8765,2/2022,10
row4,8765,3/2022,11