Home > Blockchain >  How to remove double quotes(") and new lines in between ," and ", in a unix file
How to remove double quotes(") and new lines in between ," and ", in a unix file

Time:08-05

I am getting a comma delimited file with double quotes to string and date fields. we are getting " and new line feeds in string columns like below.

"1234","asdf","with"doublequotes","new line
feed","withmultiple""doublequotes"

want output like

"1234","asdf","withdoublequotes","new linefeed","withmultipledoublequotes"

I have tried

sed 's/\([^",]\)"\([^",]\)/\1\2/g;s/\([^",]\)""/\1"/g;s/""\([^",]\)/"\1/g' < infile > outfile

its removing double quotes in string and removing last double quote like below

"1234","asdf","withdoublequotes","new line
feed","withmultiple"doublequotes

is there a way to remove " and new line feed comes in between ", and ,"

CodePudding user response:

You can try rquery (https://github.com/fuyuncat/rquery), the built-in functions are convinient.

[ rquery]$ cat mess.cvs
"1234","asdf","with"doublequotes","new line
feed","withmultiple""doublequotes"
[ rquery]$ ./rq -q "p /^\"([^\"]*)\",\"([^,]*)\",\"([^,]*)\",\"([^,]*)\",\"([^,]*)\"/ |s '\"' replace(regreplace(@1,'\n',''),'\"','') '\",\"' replace(regreplace(@2,'\n',''),'\"','') '\",\"' replace(regreplace(@3,'\n',''),'\"','') '\",\"' replace(regreplace(@4,'\n',''),'\"','') '\",\"' replace(regreplace(@5,'\n',''),'\"','') '\"'" mess.cvs
"1234","asdf","withdoublequotes","new line feed","withmultipledoublequotes"

CodePudding user response:

Your substitutions for two consecutive quotes didn't work because they are placed after the substitution for a sole quote, when only one of the two is left.

We could remove " by repeated substitutions (otherwise a quote inserted by the substitution would stay) and new line feed by joining the next input line if the current one's end is no quote:

sed ':1;/[^"]$/{;N;s/\n//;b1;};:0;s/\([^,]\)"\([^,]\)/\1\2/g;t0' <infile >outfile
  • Related