Home > Net >  remove newline in the middle of csv file
remove newline in the middle of csv file

Time:12-24

I need to clean a csv file looking like this :

food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking 
price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

Yes sometimes without double quote, but the new line occurs only with double quote fields. The issue happens only with 4th field.

I work on a awk command and it's now what I have :

awk '{ if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") gsub(/\n/," ");}' FS=";" input_file

This awk look if first char of the field is a double quote and if the last one isn't a double quote. Then try to remove the new line but he clearly didn't removing it.

I think I miss a little "easy" thing but can't figure out what is it.

Thanks for your help.

CodePudding user response:

You may use this awk:

awk -F';' -v ORS= '1; {print (NF==4 ? " " : "\n")}' file

food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

How it works:

  • This command sets ORS to empty character initially.
  • Then for each line it prints full record.
  • Then it prints a space when NF == 4 otherwise it prints a line break.

CodePudding user response:

Using GNU sed

$ sed -Ez 's/(;"[^"]*)\n/\1/g' input_file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

CodePudding user response:

With GNU awk for RT:

$ awk -v RS='"' '!(NR%2){gsub(/\n/,"")} {ORS=RT} 1' file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

CodePudding user response:

One idea for tweaking OP's current awk code:

awk -F';' '
{ if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") {    # if we have an incomplete line then ...
     printf $0                                                      # printf, sans a "\n", will leave the cursor at the end of the current line
     next                                                           # skip to next line of input
  }
}
1                                                                   # otherwise print current line
' input_file

# or as a one-liner sans comments:

awk -F';' ' { if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") { printf $0; next } } 1 ' input_file

This generates:

food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO
  • Related