Home > OS >  Convert columns to rows with awk
Convert columns to rows with awk

Time:02-08

I have an output file in the following format. How can I convert this into csv with first column as header row and second column as data row?

Input:

"Node name","SLAVE_DB"
"Node ID","2"
"PostgreSQL version","10.17"
"Total data size","1 GB"
"Conninfo","host=192.168.0.1 port=5432 user=user dbname=postgres"
"Role","standby"
"WAL archiving","disabled (on standbys "archive_mode" must be set to "always" to be effective)"

Desired Output:

"Node name","Node ID","PostgreSQL version","Total data size","Conninfo","Role","WAL archiving"
"SLAVE_DB","2","10.17","1 GB","host=192.168.0.1 port=5432 user=user dbname=postgres","standby","disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"

[Edit] I read through similar questions and I tried awk -F "," '{ for (i=1;i<=NF;i ) printf $i ", " }'
but it does not produce the desired output. I also want to preserve the quoted fields with in the data.

CodePudding user response:

If we assume that there's never quotes in your first field (the tag/name/header), then using any awk in any shell on every Unix box:

$ cat tst.awk
BEGIN { FS=OFS="," }
{
    tags[NR] = $1
    vals[NR] = $2
}
END {
    for ( colNr=1; colNr<=NR; colNr   ) {
        tag = tags[colNr]
        printf "%s%s", tag, (colNr<NR ? OFS : ORS)
    }
    for ( colNr=1; colNr<=NR; colNr   ) {
        val = vals[colNr]
        gsub(/^"|"$/,"",val)
        gsub(/"/,"\"\"",val)
        printf "\"%s\"%s", val, (colNr<NR ? OFS : ORS)
    }
}

$ awk -f tst.awk file
"Node name","Node ID","PostgreSQL version","Total data size","Conninfo","Role","WAL archiving"
"SLAVE_DB","2","10.17","1 GB","host=192.168.0.1 port=5432 user=user dbname=postgres","standby","disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"

If you need more than that then see What's the most robust way to efficiently parse CSV using awk?.

CodePudding user response:

Would you please try this?

awk '
    BEGIN { FS = OFS = "," }
    { k[NR] = $1; v[NR] = $2 }
    END {
        $0 = ""; 
        for (i = 1; i <= NR; i  ) { $i = k[i] } print;
        for (i = 1; i <= NR; i  ) { $i = v[i] } print;
    }
' file.csv

remark: The code will break if your input contains commas or newlines inside the double-quoted values


Update:

If you don't mind loosing the unnecessary quotes in the output then you could use Miller:

mlr --csv --implicit-csv-header --lazy-quotes reshape -s '1,2' file.csv
Node name,Node ID,PostgreSQL version,Total data size,Conninfo,Role,WAL archiving
SLAVE_DB,2,10.17,1 GB,host=192.168.0.1 port=5432 user=user dbname=postgres,standby,"disabled (on standbys ""archive_mode"" must be set to ""always"" to be effective)"

The --lazy-quotes option makes mlr accept quotes appearing in unquoted fields, and non-doubled quotes appearing in quoted fields.

  •  Tags:  
  • Related