I have an output file in the following format. How can I convert this into csv with first column as header row and second column as data row?
Input:
"Node name","SLAVE_DB" "Node ID","2" "PostgreSQL version","10.17" "Total data size","1 GB" "Conninfo","host=192.168.0.1 port=5432 user=user dbname=postgres" "Role","standby" "WAL archiving","disabled (on standbys "archive_mode" must be set to "always" to be effective)"
Desired Output:
"Node name","Node ID","PostgreSQL version","Total data size","Conninfo","Role","WAL archiving" "SLAVE_DB","2","10.17","1 GB","host=192.168.0.1 port=5432 user=user dbname=postgres","standby","disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"
[Edit]
I read through similar questions and I tried awk -F "," '{ for (i=1;i<=NF;i ) printf $i ", " }'
but it does not produce the desired output. I also want to preserve the quoted fields with in the data.
CodePudding user response:
If we assume that there's never quotes in your first field (the tag/name/header), then using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
tags[NR] = $1
vals[NR] = $2
}
END {
for ( colNr=1; colNr<=NR; colNr ) {
tag = tags[colNr]
printf "%s%s", tag, (colNr<NR ? OFS : ORS)
}
for ( colNr=1; colNr<=NR; colNr ) {
val = vals[colNr]
gsub(/^"|"$/,"",val)
gsub(/"/,"\"\"",val)
printf "\"%s\"%s", val, (colNr<NR ? OFS : ORS)
}
}
$ awk -f tst.awk file
"Node name","Node ID","PostgreSQL version","Total data size","Conninfo","Role","WAL archiving"
"SLAVE_DB","2","10.17","1 GB","host=192.168.0.1 port=5432 user=user dbname=postgres","standby","disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"
If you need more than that then see What's the most robust way to efficiently parse CSV using awk?.
CodePudding user response:
Would you please try this?
awk '
BEGIN { FS = OFS = "," }
{ k[NR] = $1; v[NR] = $2 }
END {
$0 = "";
for (i = 1; i <= NR; i ) { $i = k[i] } print;
for (i = 1; i <= NR; i ) { $i = v[i] } print;
}
' file.csv
remark: The code will break if your input contains commas or newlines inside the double-quoted values
Update:
If you don't mind loosing the unnecessary quotes in the output then you could use Miller:
mlr --csv --implicit-csv-header --lazy-quotes reshape -s '1,2' file.csv
Node name,Node ID,PostgreSQL version,Total data size,Conninfo,Role,WAL archiving
SLAVE_DB,2,10.17,1 GB,host=192.168.0.1 port=5432 user=user dbname=postgres,standby,"disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"
The --lazy-quotes
option makes mlr
accept quotes appearing in unquoted fields, and non-doubled quotes appearing in quoted fields.