Bash - Read a csv file and print values by using the header name instead of positional values-CodePudding

I'm using this script to read a csv and print specific values which works just fine. However I'm looking for a way that when my input file changes, (e.g. columns are added at random places or 2 existing columns are switched from position), my script would still work. A solution would be if I could use the header column names, not in a positional way as I'm doing now. However I can't find exactly how to do that. Anyone got an idea?

Input file:

db_name;db_country;db_email;db_phone;db_address
Hedwig Guthrie;Vietnam;[email protected];1-749-430-8866;"8087 Eget, Ave"
Mary Taylor;Vietnam;[email protected];1-221-754-0377;"146-561 Proin Rd."

#!/bin/bash

exec < input.csv
IFS=';'
read header

while read name country email phone address 
do
    echo "Name: " ${name}
    echo "country: " ${country}
    echo "E-mail: " ${email}
    echo "Telephone: " ${phone}
    echo "Address: " ${address}
    echo "========================"
done

Output:

Name:  Hedwig Guthrie
country:  Vietnam
E-mail:  [email protected]
Telephone:  1-749-430-8866
Address:  "8087 Eget, Ave"
========================
Name:  Mary Taylor
country:  Vietnam
E-mail:  [email protected]
Telephone:  1-221-754-0377
Address:  "146-561 Proin Rd."
========================

CodePudding user response：

Using any awk and with lots of intermediate variables and meaningful variable names to hopefully make the code easy to understand (I'm using the term "tag" to mean the names of the input columns/output rows):

$ cat tst.sh
#!/usr/bin/env bash

awk -F ';' -v OFS=': ' '
    BEGIN {
        numOutTags = split("Name;country;E-mail;Telephone;Address",outNrs2outTags)
        split("db_name;db_country;db_email;db_phone;db_address",outNrs2inTags)
    }
    NR==1 {
        for ( inTagNr=1; inTagNr<=NF; inTagNr   ) {
            inTag = $inTagNr
            inTags2inNrs[inTag] = inTagNr
        }
        next
    }
    {
        for ( outTagNr=1; outTagNr<=numOutTags; outTagNr   ) {
            outTag  = outNrs2outTags[outTagNr]
            inTag   = outNrs2inTags[outTagNr]
            inTagNr = inTags2inNrs[inTag]
            print outTag, $inTagNr
        }
        print "========================"
    }
' "${@:--}"

$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================

Now let's shuffle the order of the input columns and add a couple of additional columns you don't care about:

$ cat file
db_email;garbage;db_address;db_name;more_garbage;db_phone;db_country
[email protected];foo;"8087 Eget, Ave";Hedwig Guthrie;stuff;1-749-430-8866;Vietnam
[email protected];bar;"146-561 Proin Rd.";Mary Taylor;nonsense;1-221-754-0377;Vietnam

$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================

and as you can see the script is agnostic to those input changes.

Just list the input and output tags (names) you care about in the same order in the 2 split() commands to define columns you want output, what order you want them output in, what their output tags should be, and map between the output tags and the input tags.

The above will run orders of magnitude faster than and be more robust and portable than a shell loop, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

CodePudding user response：

For your example just

awk -F ";" 'NR>1 {print "| Name: " $1, "\n| country: " $2, "\n| email: " $3, "\n| phone: " $4, "\n| address: " $5 "\n"}' input.csv

but as you mentioned for different order and reading header, put this code in a file read_headers.awk

BEGIN {
    FS=";"
    OFS=" | "
    for (i=1; i<=NF; i  ) {
        transientLength[i] = 0
    }
}

{
    if(NR==1) {
    # read headers
        for (i=0; i<NF; i  ) {
            headers[i] = $(i 1)
            transientLength[i] = (length($(i 1))>=transientLength[i] ? length($(i 1)) : transientLength[i])
        }
    } else {
        for (i=0; i<NF; i  ) {
            fields[NR][i] = $(i 1)
            transientLength[i] = (length($(i 1))>=transientLength[i] ? length($(i 1)) : transientLength[i])
        }
    }
}

END {
    # print header
    for (j in headers) {
        spaceLength = transientLength[j]-length(headers[j])
        for (s=1;s<=spaceLength;s  ) {
            spaces = spaces" "
        }
        if (!printable) printable = headers[j] spaces
        else printable = printable OFS headers[j] spaces
        spaces = ""     # garbage collection
    }
    printable = "| "printable" |"
    print printable
    printable = ""      # garbage collection
    # print alignments
    for (j in transientLength) {
        for (i=1;i<=transientLength[j];i  ) {
            sep = sep"-"
        }
        if (!printable) printable = sep
        else printable = printable OFS sep
        sep = ""        # garbage collection
    }
    printable = "| "printable" |"
    print printable
    printable = ""      # garbage collection
    # print all rows
    for (f in fields) {
        for (j in fields[f]) {
            spaceLength = transientLength[j]-length(fields[f][j])
            for (s=1;s<=spaceLength;s  ) {
                spaces = spaces" "
            }
            if (!printable) printable = fields[f][j] spaces
            else printable = printable OFS fields[f][j] spaces
            spaces = ""     # garbage collection
        }
        printable = "| "printable" |"
        print printable
        printable = ""      # garbage collection
    }
}

then execute

awk -f read_headers.awk input.csv

credits for https://stackoverflow.com/users/2125110/daemon-painter from this question table creation from CSV file with headers using awk