Convert columns to rows with awk-CodePudding

I have an output file in the following format. How can I convert this into csv with first column as header row and second column as data row?

Input:

"Node name","SLAVE_DB"
"Node ID","2"
"PostgreSQL version","10.17"
"Total data size","1 GB"
"Conninfo","host=192.168.0.1 port=5432 user=user dbname=postgres"
"Role","standby"
"WAL archiving","disabled (on standbys "archive_mode" must be set to "always" to be effective)"

Desired Output:

"Node name","Node ID","PostgreSQL version","Total data size","Conninfo","Role","WAL archiving"
"SLAVE_DB","2","10.17","1 GB","host=192.168.0.1 port=5432 user=user dbname=postgres","standby","disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"

[Edit] I read through similar questions and I tried awk -F "," '{ for (i=1;i<=NF;i ) printf $i ", " }'
but it does not produce the desired output. I also want to preserve the quoted fields with in the data.

CodePudding user response：

If we assume that there's never quotes in your first field (the tag/name/header), then using any awk in any shell on every Unix box:

$ cat tst.awk
BEGIN { FS=OFS="," }
{
    tags[NR] = $1
    vals[NR] = $2
}
END {
    for ( colNr=1; colNr<=NR; colNr   ) {
        tag = tags[colNr]
        printf "%s%s", tag, (colNr<NR ? OFS : ORS)
    }
    for ( colNr=1; colNr<=NR; colNr   ) {
        val = vals[colNr]
        gsub(/^"|"$/,"",val)
        gsub(/"/,"\"\"",val)
        printf "\"%s\"%s", val, (colNr<NR ? OFS : ORS)
    }
}

$ awk -f tst.awk file
"Node name","Node ID","PostgreSQL version","Total data size","Conninfo","Role","WAL archiving"
"SLAVE_DB","2","10.17","1 GB","host=192.168.0.1 port=5432 user=user dbname=postgres","standby","disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"

If you need more than that then see What's the most robust way to efficiently parse CSV using awk?.

CodePudding user response：

Would you please try this?

awk '
    BEGIN { FS = OFS = "," }
    { k[NR] = $1; v[NR] = $2 }
    END {
        $0 = ""; 
        for (i = 1; i <= NR; i  ) { $i = k[i] } print;
        for (i = 1; i <= NR; i  ) { $i = v[i] } print;
    }
' file.csv

^{remark: The code will break if your input contains commas or newlines inside the double-quoted values}

Update:

If you don't mind loosing the unnecessary quotes in the output then you could use Miller:

mlr --csv --implicit-csv-header --lazy-quotes reshape -s '1,2' file.csv

Node name,Node ID,PostgreSQL version,Total data size,Conninfo,Role,WAL archiving
SLAVE_DB,2,10.17,1 GB,host=192.168.0.1 port=5432 user=user dbname=postgres,standby,"disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"

The --lazy-quotes option makes mlr accept quotes appearing in unquoted fields, and non-doubled quotes appearing in quoted fields.