I need to clean a csv file looking like this :
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking
price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO
Yes sometimes without double quote, but the new line occurs only with double quote fields. The issue happens only with 4th field.
I work on a awk command and it's now what I have :
awk '{ if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") gsub(/\n/," ");}' FS=";" input_file
This awk look if first char of the field is a double quote and if the last one isn't a double quote. Then try to remove the new line but he clearly didn't removing it.
I think I miss a little "easy" thing but can't figure out what is it.
Thanks for your help.
CodePudding user response:
You may use this awk
:
awk -F';' -v ORS= '1; {print (NF==4 ? " " : "\n")}' file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO
How it works:
- This command sets
ORS
to empty character initially. - Then for each line it prints full record.
- Then it prints a space when
NF == 4
otherwise it prints a line break.
CodePudding user response:
Using GNU sed
$ sed -Ez 's/(;"[^"]*)\n/\1/g' input_file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO
CodePudding user response:
With GNU awk for RT
:
$ awk -v RS='"' '!(NR%2){gsub(/\n/,"")} {ORS=RT} 1' file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO
CodePudding user response:
One idea for tweaking OP's current awk
code:
awk -F';' '
{ if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") { # if we have an incomplete line then ...
printf $0 # printf, sans a "\n", will leave the cursor at the end of the current line
next # skip to next line of input
}
}
1 # otherwise print current line
' input_file
# or as a one-liner sans comments:
awk -F';' ' { if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") { printf $0; next } } 1 ' input_file
This generates:
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO