I am working on Processing a (90 Cols) CSV File - Semicolon Separated (;) {case can be ignore and I am aware file standard is a mess but I am helpless in that regards}
Input Rows :
"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"
Output Expected :
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
(Double Quote can be replaced by Space or blank). {Kindly note - even though this is ';' seperated file some rows have ';' within quoted data for a column.
Issue : In the rows - I am getting an extra Double Quote within the quoted data.
Please advise me on how to handle this in Unix.
CodePudding user response:
one trick you can use is to remove "
not around the field boundaries. A simple sed
script can be
$ sed -E 's/([^\b;])"([^\b;])/\1 \2/g' file
note that if you allow escaped quote marks is you fields, this is going to remove them as well.
CodePudding user response:
What would you think of the following solution:
- Replace all
";"
by;
- Remove all remaining
"
- Replace all
;
back into";"
- Add additional
"
characters, at the beginning and at the end of every line.
The whole thing can be done with tr
or sed
or whatever command you prefer.
CodePudding user response:
mawk 'NF*(gsub(__," ",$!(NF=NF))^_ gsub(OFS,FS) gsub("^ | $",__))' \ __='\42' FS='\442\73\42' OFS='\31\17'
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
CodePudding user response:
This transform is easy to do using tool which provide regular expression with zero-length assertions (lookbehind and lookahead), as you applied unix
tag there is good chance you have perl
command and therefore I propose following solution, let file.txt
content be
"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"
then
perl -p -e 's/(?<=[[:alnum:]])"(?=[[:alnum:]])/ /g' file.txt
gives output
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
Explanation: I inform perl
that I want to use it sed
-style via -p -e
then I provide substitution (s
): "
which is after alphanumeric character (letter or digit) and before alphanumeric should be replaced using space character. This is applied to all such "
that is globally (g
).
Note: you might elect to port that answer to any other tools which does provide ability to replace regular expression with zero-length assertions.
(tested in perl 5, version 26, subversion 3)