I want to replace delimiter which is part of data from each records. For Ex-
echo '"hi","how,are,you","bye"'|sed -nE 's/"([^,]*),([^,]*),([^,]*)"/"\1;\2;\3"/gp'
output -->
"hi","how;are;you","bye"
So, I am able to replace delimiter(comma in this case), which is present in data also with semi colon. But the challenge is, in real time, we are not sure how many times delmiter will be present and also, it may come in multiple fields as well. For Ex-
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
All these are valid records. Can anyone help me out here. how can we write a generic code to handle this?
CodePudding user response:
Assuming the data does not contain embedded double quotes ...
Sample data:
$ cat delim.dat
"hi","how,are,you","bye"
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
One awk
idea whereby we replace ,
with ;
in the even numbered fields:
awk '
BEGIN { FS=OFS="\"" }
{ for (i=2;i<=NF;i=i 2) gsub(",",";",$i) }
1
' delim.dat
This generates:
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
CodePudding user response:
When working with anything but the most trivial CSV data, I prefer to use something that understands the format directly instead of messing with regular expressions to try to handle things like quoted fields. For example (Warning: Blatant self promotion ahead!), my tcl
-based awk
-like utility tawk, which I wrote in part to make it easier to manipulate CSV files:
$ tawk -csv -quoteall '
line {
for {set n 1} {$n <= $NF} {incr n} {
set F($n) [string map {, \;} $F($n)]
}
print
}' input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
Or a perl
approach using the Text::CSV_XS
module:
$ perl -MText::CSV_XS -e '
my $csv = Text::CSV_XS->new({binary=>1, always_quote=>1});
while (my $row = $csv->getline(\*STDIN)) {
tr/,/;/ foreach @$row;
$csv->say(\*STDOUT, $row);
}' < input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"