I'm trying to figure out how to write a .awk script that takes a .csv file as input and outputs it without commas and with columns aligned. So far I've tried this :
{ printf "%-10s %s\n", $1, $2, $3 ,$4 }
But this only outputs the data in the first two fields aligned. It does a good job of removing the comma delimiters but there's commas within double quotes in the fourth column that I wonder if will cause an issue. Any guidance is much appreciated I'm very new using awk.
Sample input is like:
Name,Last Name,Gender,Pet
Kit,Rattenberie,Male,"Crake, african black"
Cliff,Lakes,Male,"Red phalarope"
Tirrell,Stables,Male,"Rhea, greater"
Cherry,William,Female,"Crow, house"
Desired output will be something like:
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
For a .csv file of 10 rows. Thanks in advance
CodePudding user response:
Using miller we can transform the input data from CSV to "pretty print" format with a command line option:
mlr --c2p cat ./input
Name Last Name Gender Pet
Kit Rattenberie Male Crake, african black
Cliff Lakes Male Red phalarope
Tirrell Stables Male Rhea, greater
Cherry William Female Crow, house
It drops the quotes though. The --barred
option is nice too:
mlr --c2p --barred cat ./input
--------- ------------- -------- ----------------------
| Name | Last Name | Gender | Pet |
--------- ------------- -------- ----------------------
| Kit | Rattenberie | Male | Crake, african black |
| Cliff | Lakes | Male | Red phalarope |
| Tirrell | Stables | Male | Rhea, greater |
| Cherry | William | Female | Crow, house |
--------- ------------- -------- ----------------------
An awk technique that's more programming: keep track of the max width of each column while you're reading the input file, then use that to print the data at the end: this is essentially re-implementing column -t
awk -v FPAT='"[^"]*"|[^,] ' '
{
for (i=1; i<=NF; i ) {
data[NR][i] = $i
if (length($i) > maxw[i]) maxw[i] = length($i)
}
}
END {
for (i=1; i<=NR; i ) {
for (j=1; j<=length(data[i]); j ) {
printf "%-*s ", maxw[j], data[i][j]
}
printf "\n"
}
}
' ./input
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
CodePudding user response:
Using gnu-awk
, you can use this:
awk -v FPAT='"[^"]*"|[^,] ' '{
for (i=1; i<=NF; i) $i = sprintf("%-12s", $i)} 1' file
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
Or if width is totally unpredictable then use this awk column
solution:
awk -v FPAT='"[^"]*"|[^,] ' -v OFS=';' '{$1=$1} 1' file |
column -s';' -t
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"
If you want to create an awk script then use:
cat col.awk
BEGIN {
FPAT="\"[^\"]*\"|[^,] "
OFS=";"
}
{$1 = $1}
1
Use it as:
awk -f col.awk file.csv | column -s';' -t
CodePudding user response:
One awk
idea using *.awk
script (per OP's comment), and having awk
determine the max width of each column:
$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,] " } # instead of parsing on field delimiter (via FS) ... parse on field format via (FPAT)
{ for (i=1;i<=NF;i )
w[i]= length($i) > w[i] ? length($i) : w[i] # keep track of max width of each column
lines[FNR]=$0 # save entire line
}
END { for (i=1;i<=FNR;i ) { # loop through each saved line
n=patsplit(lines[i],a) # reparse based on FPAT, storing fields in array a[]
for (j=1;j<n;j ) # loop through array entries ...
printf "%-*s%s", w[j], a[j], OFS # printing to stdout
print a[n] # print last field plus "\n"
}
}
Or using a multi-dimensional array to store the input thus eliminating the 2nd parsing (via patsplit()
) of the input data:
$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,] " }
{ for (i=1;i<=NF;i ) {
w[i]= length($i) > w[i] ? length($i) : w[i]
fields[FNR][i]=$i
}
}
END { for (i=1;i<=FNR;i ) {
for (j=1;j<NF;j )
printf "%-*s%s", w[j], fields[i][j], OFS
print fields[i][NF]
}
}
NOTES:
- assumes entire file can fit into memory (via the
awk/lines[]
orawk/fields[][]
array) - requires
GNU awk
forFPAT
and multi-dimensional array support
Both of these generate:
$ awk -f script.awk file
Name Last Name Gender Pet
Kit Rattenberie Male "Crake, african black"
Cliff Lakes Male "Red phalarope"
Tirrell Stables Male "Rhea, greater"
Cherry William Female "Crow, house"