Bash group csv data by values-CodePudding

I have a CSV that looks like this:

created,id,value
2022-12-1611:55,58,10
2022-12-1611:55,59,2
2022-12-1611:50,58,11
2022-12-1611:50,59,3
2022-12-1611:50,60,7

I want to parse it so I have the following result, setting ids as columns and grouping by date:

created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7

missing values are set to nan, each id appears at most once per date

How can I do it? I also have the first CSV in a JSON equivalent if this is easier to do with jq

The JSON is composed of similar elements:

{ 
  "created": "2022-12-16 09:15", 
  "value": "10.4", 
  "id": "60" 
}

CodePudding user response：

try it script:

    cat xtmp.txt | sort | sed -e 's\,\\g' | awk '{if(!a[$1" "$2]){a[$1" "$2]=$4;b[$1" "$2]=c[$1" "$2]="nan"};z[$1" "$2] =1;if(z[$1" "$2]==2){b[$1" "$2]=$4}else if (z[$1" "$2]==3){c[$1" "$2]=$4}}END{for(i in a) printf  "%s, %s, %s, %s \n", i, a[i], b[i], c[i]}'

CodePudding user response：

Using the great Miller (version >= 6), running

mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv

you get

created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7

The core command here is reshape -s id,value, to transform your input from long to wide structure.