I am getting a comma delimited file with double quotes to string and date fields. we are getting " and new line feeds in string columns like below.
"1234","asdf","with"doublequotes","new line
feed","withmultiple""doublequotes"
want output like
"1234","asdf","withdoublequotes","new linefeed","withmultipledoublequotes"
I have tried
sed 's/\([^",]\)"\([^",]\)/\1\2/g;s/\([^",]\)""/\1"/g;s/""\([^",]\)/"\1/g' < infile > outfile
its removing double quotes in string and removing last double quote like below
"1234","asdf","withdoublequotes","new line
feed","withmultiple"doublequotes
is there a way to remove " and new line feed comes in between ", and ,"
CodePudding user response:
You can try rquery (https://github.com/fuyuncat/rquery), the built-in functions are convinient.
[ rquery]$ cat mess.cvs
"1234","asdf","with"doublequotes","new line
feed","withmultiple""doublequotes"
[ rquery]$ ./rq -q "p /^\"([^\"]*)\",\"([^,]*)\",\"([^,]*)\",\"([^,]*)\",\"([^,]*)\"/ |s '\"' replace(regreplace(@1,'\n',''),'\"','') '\",\"' replace(regreplace(@2,'\n',''),'\"','') '\",\"' replace(regreplace(@3,'\n',''),'\"','') '\",\"' replace(regreplace(@4,'\n',''),'\"','') '\",\"' replace(regreplace(@5,'\n',''),'\"','') '\"'" mess.cvs
"1234","asdf","withdoublequotes","new line feed","withmultipledoublequotes"
CodePudding user response:
Your substitutions for two consecutive quotes didn't work because they are placed after the substitution for a sole quote, when only one of the two is left.
We could remove " by repeated substitutions (otherwise a quote inserted by the substitution would stay) and new line feed by joining the next input line if the current one's end is no quote:
sed ':1;/[^"]$/{;N;s/\n//;b1;};:0;s/\([^,]\)"\([^,]\)/\1\2/g;t0' <infile >outfile