Home > Net >  Search and replace (escape) double quotes within double quotes in CSV values
Search and replace (escape) double quotes within double quotes in CSV values

Time:12-30

I want to replace all occurencies of " between ," and ", with ''' (three singular quotes). It will be done on a csv file and on all possible nested quotes to not mess up formatting.
E.g.
"test","""","test" becomes
"test","''''''","test".
Another example:
"test","quotes "inside" quotes","test"
becomes
"test","quotes '''inside''' quotes".

I use https://sed.js.org/ to test the replacement.

What I currently have is

sed "s/\([^,]\)\(\"\)\(.\)/\\1'\\''\\3/g"

but it seems not completed and it doesn't cover all cases that I want.

e.g. works:
"anything","inside "quotes"","anything" ->
"anything","inside '''quotes'''","anything"
doesn't work for:
"anything","inside "test" quotes","anything" ->
"anything''',"inside '''test''' quotes''',"anything"
expected ->
"anything","inside '''test''' quotes","anything"

Maybe somebody is good with regex expressions and could help?

CodePudding user response:

Using sed

$ cat input_file
"test","""","test"
"test","quotes "inside" quotes","test"
"anything","inside "quotes"","anything"
"anything","inside "test" quotes","anything"

$ sed -E ':a;s/(,"[^,]*('"'"' )?)"([^,]*"(,|$))/\1'"'''"'\3/;ta' input_file
"test","''''''","test"
"test","quotes '''inside''' quotes","test"
"anything","inside '''quotes'''","anything"
"anything","inside '''test''' quotes","anything"

CodePudding user response:

Escaping the triple single quotes is avoided woth a variable ${qs}.
Start replacing all quotes with ${qs}.
Next reset the replacements at the start of line, end of line and around ,.

qs="'''"
sed "s/\"/${qs}/g; s/^${qs}/\"/; s/${qs}$/\"/; s/${qs},${qs}/\",\"/g" csvfile
  • Related