Home > database >  Remove double quotes within the column value using Unix
Remove double quotes within the column value using Unix

Time:09-08

I am working on Processing a (90 Cols) CSV File - Semicolon Separated (;) {case can be ignore and I am aware file standard is a mess but I am helpless in that regards}

Input Rows :

"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"

Output Expected :

"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"

(Double Quote can be replaced by Space or blank). {Kindly note - even though this is ';' seperated file some rows have ';' within quoted data for a column.

Issue : In the rows - I am getting an extra Double Quote within the quoted data.

Please advise me on how to handle this in Unix.

CodePudding user response:

one trick you can use is to remove " not around the field boundaries. A simple sed script can be

$ sed -E 's/([^\b;])"([^\b;])/\1 \2/g' file 

note that if you allow escaped quote marks is you fields, this is going to remove them as well.

CodePudding user response:

What would you think of the following solution:

  1. Replace all ";" by ;
  2. Remove all remaining "
  3. Replace all ; back into ";"
  4. Add additional " characters, at the beginning and at the end of every line.

The whole thing can be done with tr or sed or whatever command you prefer.

CodePudding user response:

mawk 'NF*(gsub(__," ",$!(NF=NF))^_  gsub(OFS,FS)  gsub("^ | $",__))' \
               __='\42'  FS='\442\73\42' OFS='\31\17'                
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"

CodePudding user response:

This transform is easy to do using tool which provide regular expression with zero-length assertions (lookbehind and lookahead), as you applied unix tag there is good chance you have perl command and therefore I propose following solution, let file.txt content be

"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"

then

perl -p -e 's/(?<=[[:alnum:]])"(?=[[:alnum:]])/ /g' file.txt

gives output

"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"

Explanation: I inform perl that I want to use it sed-style via -p -e then I provide substitution (s): " which is after alphanumeric character (letter or digit) and before alphanumeric should be replaced using space character. This is applied to all such " that is globally (g).

Note: you might elect to port that answer to any other tools which does provide ability to replace regular expression with zero-length assertions.

(tested in perl 5, version 26, subversion 3)

  • Related