Home > Mobile >  Bash regexp pattern replacement (while parsing .csv)
Bash regexp pattern replacement (while parsing .csv)

Time:06-07

Parsing .csv file. It contains some cells with text like:

,"Some words, some more words, an so on",

So using , delimeter doesn't work correctly. The only solution i see is a regex pattern, which matches the string. To replace commas inside " " with some rarely used symbol combination (like '___'). And to replace back to the original after script finish it's job.

Something like echo ${var//in/out}

But i'm not strong in regular expressions. And maybe i don't see more obvious solution.

Any help appreciated.

CodePudding user response:

For the conversion you're trying to do all you need is:

$ awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i =2) gsub(/ *, */," ",$i)} 1' file
,"Some words some more words an so on",

For anything more interesting see What's the most robust way to efficiently parse CSV using awk? for how to use awk on CSVs.

CodePudding user response:

Got the solution by myself.

First replace all сombinations of , to ___ in initial file

cat ./initial_file.csv | sed -e 's|, |___|g'

In the end replace back all ___ to initial ,

sed -i 's|___|, |g' ./final_file.csv

Not so tricky :)

  • Related