I need to parse the content of a CSV file, that changes on its content, without any advise about. Therefore, sometimes I'll receive a file with this content:
"No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"
and some others with this..
"No,Latitude,Longitude,Name,Altitude,Date,Time"
or maybe other (always with the same content.. but sometimes 5 columns.. some others 6 columns... etc. I mean.. sometimes I miss Name.. or Satellites... or both)
Therefore.. I need to read and understand what is inside of each column for elaborating a fixed CSV with (always) the same content:
"No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"
In order to do that I was thinking on:
- Read the header and assign some values to each column:
IFS=",";read -r a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11]
This is for reading the header
Same process for the header I need to create (the one I know):
IFS=",";read -r b[1] b[2] b[3] b[4] b[5] b[6] b[7] <<< "No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"
- Run a loop for each non empty values of the A array.. trying to match any of the headers of B array
do
for x in [1..7]
do
if [[ ${a[i]} = ${b[x]} ]]
then
# The column i from my source file belongs to the column x of my output file
output[$x]=$i
fi
done
done
- Store this on my new file.. with something like this.
IFS=","; while read -r a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11]
do
echo "${a[output[1]]}, ${a[output[2]]},${a[output[3]]},${a[output[4]]},${a[output[5]]},${a[output[6]]},${a[output[7]]}" >> new.csv
done < source.csv
I think this should work since it will record the appropriated source column into the desired final column....
But.. I'm sure that there are many ways easier and more elegant for doing this..
Any help?
Thanks
CodePudding user response:
Miller possesses the template
verb for exactly this kind of task:
mlr template [options]
Places input-record fields in the order specified by list of column names.
If the input record is missing a specified field, it will be filled with the fill-with.
If the input record possesses an unspecified field, it will be discarded.
Options:
-f {a,b,c}
Comma-separated field names for template.
-t {filename}
CSV file whose header line will be used for template.
--fill-with {filler string}
What to fill absent fields with. Defaults to the empty string.
So you could do:
mlr --csv template \
-f 'No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time' \
--fill-with '0' \
file.csv
Or use the header of an other file as reference:
mlr --csv template -t header.csv --fill-with '0' file.csv