I have a CSV that looks like this:
created,id,value
2022-12-1611:55,58,10
2022-12-1611:55,59,2
2022-12-1611:50,58,11
2022-12-1611:50,59,3
2022-12-1611:50,60,7
I want to parse it so I have the following result, setting ids as columns and grouping by date:
created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7
missing values are set to nan, each id appears at most once per date
How can I do it? I also have the first CSV in a JSON equivalent if this is easier to do with jq
The JSON is composed of similar elements:
{
"created": "2022-12-16 09:15",
"value": "10.4",
"id": "60"
}
CodePudding user response:
try it script:
cat xtmp.txt | sort | sed -e 's\,\\g' | awk '{if(!a[$1" "$2]){a[$1" "$2]=$4;b[$1" "$2]=c[$1" "$2]="nan"};z[$1" "$2] =1;if(z[$1" "$2]==2){b[$1" "$2]=$4}else if (z[$1" "$2]==3){c[$1" "$2]=$4}}END{for(i in a) printf "%s, %s, %s, %s \n", i, a[i], b[i], c[i]}'
CodePudding user response:
Using the great Miller (version >= 6), running
mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv
you get
created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7
The core command here is reshape -s id,value
, to transform your input from long to wide structure.