Receiving a flat file as a source. In one of the fields, the value is segregated into new lines, but there is a need to break the newline and combine it into a single content.
Ex: File is as below:
PO,MISC,yes,"This
is
an
example"
PO,MISC,yes,"This
is
another
example"
In the above ex, the data is being read as 9 lines, but we need the input to be read as a single line, as shown below -
PO, MISC, yes, "This is an example"
PO, MISC, yes, "This is another example"
Tried via the below syntax but did not succeed. Is there any way to achieve this? I also need to print the file contents into another file.
Syntax:
awk -v RS='([^,] \\,){4}[^,] \n' '{gsub(/\n/,"",RT); print RT}' sample_attachments.csv > test.csv
CodePudding user response:
With your shown samples Only, please try following awk
, written and tested in GNU awk
. Simple explanation would be, setting RS
to \"\n
and setting field separator as ,
. In main block Globally substituting new lines with spaces in $NF. Then using printf
printing current line along with value of RT
.
awk -v RS="\"\n" 'BEGIN{FS=OFS=","} {gsub(/\n/," ",$NF);printf("%s",$0 RT)}' Input_file
CodePudding user response:
awk -F"," '
BEGIN{ getline; n=NF; print}
{ split($0,a,FS);
while(length(a)<=n){
s=$0;
getline;
$0=s " " $0;
split($0,a,FS);
}
print $0 }' sample_attachements.txt
BEGIN(....)
store the number of fields in the variablen
- while the number of fields (length of array
a
) is unequal n, read another line, and append it to the input. print $0
finally print the (modified)input line
CodePudding user response:
I would harness GNU AWK
for this task following way, let file.txt
content be
field1, field2, field3, field4
PO,MISC,yes,"This
is
an
example"
then
awk 'BEGIN{RS="";FPAT=".";OFS=""}{for(i=1;i<=NF;i =1){cnt =($i=="\"");if($i=="\n"&&cnt%2){$i=" "}};print}' file.txt
gives output
field1, field2, field3, field4
PO,MISC,yes,"This is an example"
Assumptions: there is never more than 1 newline in succession, "
are never nested, Explanation: I inform GNU AWK
to enter paragraph more, that is treat everything between blank lines as one row and that field pattern is .
, i.e. every character is field and that output field separator is empty string. Then I iterate over characters, if I encounter "
I increase cnt
by 1, which is used for dead-reckoning if I am outside "
..."
or inside "
..."
, when I encounter newline character and cnt is odd I am inside so I swap that for space character. After all character are processed I print
them.
(tested in gawk 4.2.1)
CodePudding user response:
You may use this gnu-awk
solution:
awk -v RS='"[^"]*"' '{ORS = gensub(/\n/, " ", "g", RT)} 1' file
field1, field2, field3, field4
PO,MISC,yes,"This is an example"
PO,MISC,no,"This is another example"
Where input file is this:
cat file
field1, field2, field3, field4
PO,MISC,yes,"This
is
an
example"
PO,MISC,no,"This
is
another
example"
For the updated question use this awk
:
awk -F, -v OFS=", " -v RS='"[^"]*"|\n' '{
ORS = gensub(/\n(.)/, " \\1", "g", RT)
$1 = $1
} 1' file
field1, field2, field3, field4
PO, MISC, yes, "This is an example"
PO, MISC, no, "This is another example"
CodePudding user response:
Using any awk:
$ awk -v RS='"' -v ORS= '!(NR%2){gsub(/\n/,OFS); $0="\"" $0 "\""} 1' file
PO,MISC,yes,"This is an example"
PO,MISC,yes,"This is another example"
For anything else, see whats-the-most-robust-way-to-efficiently-parse-csv-using-awk.
CodePudding user response:
Your input file name is assumed to be "file" here, and output is "newfile."
#!/bin/sh -x
cp file stack
cat > ed1 <<EOF
1,4w f1
1,4d
wq
EOF
next () {
[[ -s stack ]] && main
end
}
main () {
ed -s stack < ed1
cat f1 | tr '\n' ' ' >> newfile
next
}
end () {
rm -v ./ed1
rm -v ./f1
rm -v ./stack
}
next