Aligning columns using an awk script file-CodePudding

I'm trying to figure out how to write a .awk script that takes a .csv file as input and outputs it without commas and with columns aligned. So far I've tried this :

{ printf "%-10s %s\n", $1, $2, $3 ,$4 }

But this only outputs the data in the first two fields aligned. It does a good job of removing the comma delimiters but there's commas within double quotes in the fourth column that I wonder if will cause an issue. Any guidance is much appreciated I'm very new using awk.

Sample input is like:

Name,Last Name,Gender,Pet
Kit,Rattenberie,Male,"Crake, african black"
Cliff,Lakes,Male,"Red phalarope"
Tirrell,Stables,Male,"Rhea, greater"
Cherry,William,Female,"Crow, house"

Desired output will be something like:

Name    Last Name    Gender   Pet
Kit     Rattenberie  Male    "Crake, african black"
Cliff   Lakes        Male    "Red phalarope"
Tirrell Stables      Male    "Rhea, greater"
Cherry  William      Female  "Crow, house"

For a .csv file of 10 rows. Thanks in advance

CodePudding user response：

Using miller we can transform the input data from CSV to "pretty print" format with a command line option:

mlr --c2p cat ./input

Name    Last Name   Gender Pet
Kit     Rattenberie Male   Crake, african black
Cliff   Lakes       Male   Red phalarope
Tirrell Stables     Male   Rhea, greater
Cherry  William     Female Crow, house

It drops the quotes though. The --barred option is nice too:

mlr --c2p --barred cat ./input

 --------- ------------- -------- ---------------------- 
| Name    | Last Name   | Gender | Pet                  |
 --------- ------------- -------- ---------------------- 
| Kit     | Rattenberie | Male   | Crake, african black |
| Cliff   | Lakes       | Male   | Red phalarope        |
| Tirrell | Stables     | Male   | Rhea, greater        |
| Cherry  | William     | Female | Crow, house          |
 --------- ------------- -------- ----------------------

An awk technique that's more programming: keep track of the max width of each column while you're reading the input file, then use that to print the data at the end: this is essentially re-implementing column -t

awk -v FPAT='"[^"]*"|[^,] ' '
    {
        for (i=1; i<=NF; i  ) {
            data[NR][i] = $i
            if (length($i) > maxw[i]) maxw[i] = length($i)
        }
    }
    END {
        for (i=1; i<=NR; i  ) {
            for (j=1; j<=length(data[i]); j  ) {
                printf "%-*s  ", maxw[j], data[i][j]
            }
            printf "\n"
        }
    }
' ./input

Name     Last Name    Gender  Pet
Kit      Rattenberie  Male    "Crake, african black"
Cliff    Lakes        Male    "Red phalarope"
Tirrell  Stables      Male    "Rhea, greater"
Cherry   William      Female  "Crow, house"

CodePudding user response：

Using gnu-awk, you can use this:

awk -v FPAT='"[^"]*"|[^,] ' '{
   for (i=1; i<=NF;   i) $i = sprintf("%-12s", $i)} 1' file

Name     Last Name    Gender  Pet
Kit      Rattenberie  Male    "Crake, african black"
Cliff    Lakes        Male    "Red phalarope"
Tirrell  Stables      Male    "Rhea, greater"
Cherry   William      Female  "Crow, house"

Or if width is totally unpredictable then use this awk column solution:

awk -v FPAT='"[^"]*"|[^,] ' -v OFS=';' '{$1=$1} 1' file |
column -s';' -t

Name     Last Name    Gender  Pet
Kit      Rattenberie  Male    "Crake, african black"
Cliff    Lakes        Male    "Red phalarope"
Tirrell  Stables      Male    "Rhea, greater"
Cherry   William      Female  "Crow, house"

If you want to create an awk script then use:

cat col.awk

BEGIN {
   FPAT="\"[^\"]*\"|[^,] "
   OFS=";"
}
{$1 = $1}
1

Use it as:

awk -f col.awk file.csv | column -s';' -t

CodePudding user response：

One awk idea using *.awk script (per OP's comment), and having awk determine the max width of each column:

$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,] " }                            # instead of parsing on field delimiter (via FS) ... parse on field format via (FPAT)
      { for (i=1;i<=NF;i  )
            w[i]= length($i) > w[i] ? length($i) : w[i]      # keep track of max width of each column
        lines[FNR]=$0                                        # save entire line
      }
END   { for (i=1;i<=FNR;i  ) {                               # loop through each saved line
            n=patsplit(lines[i],a)                           # reparse based on FPAT, storing fields in array a[]
            for (j=1;j<n;j  )                                # loop through array entries ...
                printf "%-*s%s", w[j], a[j], OFS             # printing to stdout
            print a[n]                                       # print last field plus "\n"
        }
      }

Or using a multi-dimensional array to store the input thus eliminating the 2nd parsing (via patsplit()) of the input data:

$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,] " }
      { for (i=1;i<=NF;i  ) {
            w[i]= length($i) > w[i] ? length($i) : w[i]
            fields[FNR][i]=$i
        }
      }
END   { for (i=1;i<=FNR;i  ) {
            for (j=1;j<NF;j  )
                printf "%-*s%s", w[j], fields[i][j], OFS
            print fields[i][NF]
        }
      }

NOTES:

assumes entire file can fit into memory (via the awk/lines[] or awk/fields[][] array)
requires GNU awk for FPAT and multi-dimensional array support

Both of these generate:

$ awk -f script.awk file
Name    Last Name   Gender Pet
Kit     Rattenberie Male   "Crake, african black"
Cliff   Lakes       Male   "Red phalarope"
Tirrell Stables     Male   "Rhea, greater"
Cherry  William     Female "Crow, house"