How to replace double quotes within a string with apostrophe (not the outer double quotes specifying-CodePudding

I have pipe delimited .txt files in Linux that contain strings within double quotes.

Some of the strings will have a double quote instead of an apostrophoe.

e.g. I"m a string.

This will be represented in the file within the pipes as "I"m a string"

I need to replace "I"m a string" with "I'm a string".

How can I do that with sed or using Python/Jupyter?

Examples

"String"|"I"m not a valid string"|"I'm a valid string"

Based on the data requirements I don't need to worry about things like:

Pipe within the double quotes e.g. "Str|Srt"|"Str"
Mix of double and single quotes e.t. "Str'|'Str"

CodePudding user response：

I might be tempted to use perl

$ cat file.txt
"first"|"second"|"I"m a string"|"fourth"

$ perl -lne '
  print join "|",              # join, clearly
    map {"\"" . $_ . "\""}     # re-add outer quotes
    map {s/"/\047/g; $_}       # replace inner quotes
    map {s/^"|"$//g; $_}       # remove leading/trailing quotes
    split /[|]/                # split the input on pipes
' file.txt
"first"|"second"|"I'm a string"|"fourth"

Although, as Shawn comments, replacing inner quotes with doubled double quotes gives you valid CSV.

    map {s/"/""/g; $_}       # replace inner quotes

CodePudding user response：

This sed should work

sed -E "s/([A-Za-z0-9])\"([^|].*)/\1\'\2/g" input_file

With grouping in sed, you can exclude the invalid quote " from the match and replace it when reinstating the groups.

Output

"I'm a string"

CodePudding user response：

You should escape " and ' when using sed. The syntax for the sed command is: "s/old_pattern/new_pattern/g", where "g" stands for global matching. The solution you need is:

sed -i "s/\"/\'/g" file.txt

CodePudding user response：

Using any sed in any shell on every Unix box:

$ sed "s/\"/'/g; s/'|'/\"|\"/g; s/^'/\"/; s/'$/\"/" file
"String"|"I'm not a valid string"|"I'm a valid string"