Home > database >  bash - mapping a variable header when reading CSV
bash - mapping a variable header when reading CSV

Time:06-15

I need to parse the content of a CSV file, that changes on its content, without any advise about. Therefore, sometimes I'll receive a file with this content:

"No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"

and some others with this..

"No,Latitude,Longitude,Name,Altitude,Date,Time"

or maybe other (always with the same content.. but sometimes 5 columns.. some others 6 columns... etc. I mean.. sometimes I miss Name.. or Satellites... or both)

Therefore.. I need to read and understand what is inside of each column for elaborating a fixed CSV with (always) the same content:

"No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"

In order to do that I was thinking on:

  1. Read the header and assign some values to each column:

IFS=",";read -r a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11]

This is for reading the header

Same process for the header I need to create (the one I know):

IFS=",";read -r b[1] b[2] b[3] b[4] b[5] b[6] b[7] <<< "No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"

  1. Run a loop for each non empty values of the A array.. trying to match any of the headers of B array
do

for x in [1..7]

do

if [[ ${a[i]} = ${b[x]} ]] 

then

# The column i from my source file belongs to the column x of my output file 

output[$x]=$i

fi

done

done
  1. Store this on my new file.. with something like this.
IFS=","; while read -r a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11]

do
 
echo "${a[output[1]]}, ${a[output[2]]},${a[output[3]]},${a[output[4]]},${a[output[5]]},${a[output[6]]},${a[output[7]]}" >> new.csv

done < source.csv

I think this should work since it will record the appropriated source column into the desired final column....

But.. I'm sure that there are many ways easier and more elegant for doing this..

Any help?

Thanks

CodePudding user response:

Miller possesses the template verb for exactly this kind of task:

mlr template [options]
     Places input-record fields in the order specified by list of column names.
     If the input record is missing a specified field, it will be filled with the fill-with.
     If the input record possesses an unspecified field, it will be discarded.
Options:
     -f {a,b,c} Comma-separated field names for template.
     -t {filename} CSV file whose header line will be used for template.
     --fill-with {filler string} What to fill absent fields with. Defaults to the empty string.

So you could do:

mlr --csv template \
    -f 'No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time' \
    --fill-with '0' \
    file.csv

Or use the header of an other file as reference:

mlr --csv template -t header.csv --fill-with '0' file.csv
  • Related