I'm using this script to read a csv and print specific values which works just fine. However I'm looking for a way that when my input file changes, (e.g. columns are added at random places or 2 existing columns are switched from position), my script would still work. A solution would be if I could use the header column names, not in a positional way as I'm doing now. However I can't find exactly how to do that. Anyone got an idea?
Input file:
db_name;db_country;db_email;db_phone;db_address
Hedwig Guthrie;Vietnam;[email protected];1-749-430-8866;"8087 Eget, Ave"
Mary Taylor;Vietnam;[email protected];1-221-754-0377;"146-561 Proin Rd."
#!/bin/bash
exec < input.csv
IFS=';'
read header
while read name country email phone address
do
echo "Name: " ${name}
echo "country: " ${country}
echo "E-mail: " ${email}
echo "Telephone: " ${phone}
echo "Address: " ${address}
echo "========================"
done
Output:
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================
CodePudding user response:
Using any awk and with lots of intermediate variables and meaningful variable names to hopefully make the code easy to understand (I'm using the term "tag" to mean the names of the input columns/output rows):
$ cat tst.sh
#!/usr/bin/env bash
awk -F ';' -v OFS=': ' '
BEGIN {
numOutTags = split("Name;country;E-mail;Telephone;Address",outNrs2outTags)
split("db_name;db_country;db_email;db_phone;db_address",outNrs2inTags)
}
NR==1 {
for ( inTagNr=1; inTagNr<=NF; inTagNr ) {
inTag = $inTagNr
inTags2inNrs[inTag] = inTagNr
}
next
}
{
for ( outTagNr=1; outTagNr<=numOutTags; outTagNr ) {
outTag = outNrs2outTags[outTagNr]
inTag = outNrs2inTags[outTagNr]
inTagNr = inTags2inNrs[inTag]
print outTag, $inTagNr
}
print "========================"
}
' "${@:--}"
$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================
Now let's shuffle the order of the input columns and add a couple of additional columns you don't care about:
$ cat file
db_email;garbage;db_address;db_name;more_garbage;db_phone;db_country
[email protected];foo;"8087 Eget, Ave";Hedwig Guthrie;stuff;1-749-430-8866;Vietnam
[email protected];bar;"146-561 Proin Rd.";Mary Taylor;nonsense;1-221-754-0377;Vietnam
$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================
and as you can see the script is agnostic to those input changes.
Just list the input and output tags (names) you care about in the same order in the 2 split()
commands to define columns you want output, what order you want them output in, what their output tags should be, and map between the output tags and the input tags.
The above will run orders of magnitude faster than and be more robust and portable than a shell loop, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
CodePudding user response:
For your example just
awk -F ";" 'NR>1 {print "| Name: " $1, "\n| country: " $2, "\n| email: " $3, "\n| phone: " $4, "\n| address: " $5 "\n"}' input.csv
but as you mentioned for different order and reading header, put this code in a file read_headers.awk
BEGIN {
FS=";"
OFS=" | "
for (i=1; i<=NF; i ) {
transientLength[i] = 0
}
}
{
if(NR==1) {
# read headers
for (i=0; i<NF; i ) {
headers[i] = $(i 1)
transientLength[i] = (length($(i 1))>=transientLength[i] ? length($(i 1)) : transientLength[i])
}
} else {
for (i=0; i<NF; i ) {
fields[NR][i] = $(i 1)
transientLength[i] = (length($(i 1))>=transientLength[i] ? length($(i 1)) : transientLength[i])
}
}
}
END {
# print header
for (j in headers) {
spaceLength = transientLength[j]-length(headers[j])
for (s=1;s<=spaceLength;s ) {
spaces = spaces" "
}
if (!printable) printable = headers[j] spaces
else printable = printable OFS headers[j] spaces
spaces = "" # garbage collection
}
printable = "| "printable" |"
print printable
printable = "" # garbage collection
# print alignments
for (j in transientLength) {
for (i=1;i<=transientLength[j];i ) {
sep = sep"-"
}
if (!printable) printable = sep
else printable = printable OFS sep
sep = "" # garbage collection
}
printable = "| "printable" |"
print printable
printable = "" # garbage collection
# print all rows
for (f in fields) {
for (j in fields[f]) {
spaceLength = transientLength[j]-length(fields[f][j])
for (s=1;s<=spaceLength;s ) {
spaces = spaces" "
}
if (!printable) printable = fields[f][j] spaces
else printable = printable OFS fields[f][j] spaces
spaces = "" # garbage collection
}
printable = "| "printable" |"
print printable
printable = "" # garbage collection
}
}
then execute
awk -f read_headers.awk input.csv
credits for https://stackoverflow.com/users/2125110/daemon-painter from this question table creation from CSV file with headers using awk