I have pipe delimited .txt files in Linux that contain strings within double quotes.
Some of the strings will have a double quote instead of an apostrophoe.
e.g. I"m a string.
This will be represented in the file within the pipes as "I"m a string"
I need to replace "I"m a string" with "I'm a string".
How can I do that with sed or using Python/Jupyter?
Examples
"String"|"I"m not a valid string"|"I'm a valid string"
Based on the data requirements I don't need to worry about things like:
- Pipe within the double quotes e.g. "Str|Srt"|"Str"
- Mix of double and single quotes e.t. "Str'|'Str"
CodePudding user response:
I might be tempted to use perl
$ cat file.txt
"first"|"second"|"I"m a string"|"fourth"
$ perl -lne '
print join "|", # join, clearly
map {"\"" . $_ . "\""} # re-add outer quotes
map {s/"/\047/g; $_} # replace inner quotes
map {s/^"|"$//g; $_} # remove leading/trailing quotes
split /[|]/ # split the input on pipes
' file.txt
"first"|"second"|"I'm a string"|"fourth"
Although, as Shawn comments, replacing inner quotes with doubled double quotes gives you valid CSV.
map {s/"/""/g; $_} # replace inner quotes
CodePudding user response:
This sed
should work
sed -E "s/([A-Za-z0-9])\"([^|].*)/\1\'\2/g" input_file
With grouping in sed
, you can exclude the invalid quote "
from the match and replace it when reinstating the groups.
Output
"I'm a string"
CodePudding user response:
You should escape " and ' when using sed. The syntax for the sed command is: "s/old_pattern/new_pattern/g", where "g" stands for global matching. The solution you need is:
sed -i "s/\"/\'/g" file.txt
CodePudding user response:
Using any sed in any shell on every Unix box:
$ sed "s/\"/'/g; s/'|'/\"|\"/g; s/^'/\"/; s/'$/\"/" file
"String"|"I'm not a valid string"|"I'm a valid string"