I have many files like the following:
cat test.data
name1
...
nameN
title1
...
titleM
abstract1
...
abstractO
ID
where the numbers N,M,O differ from file to file. But in all files the fields are separated by empty lines. I want to transform these data into csv, each file into a line with the rows (name, title, abtract, ID) like this:
name1 ...nameN|title1 ... titleM|abstract1 ... abstractO|ID
I have tried with awk and sed, but failed. Any suggestions would be helpful. Thanks in advance.
CodePudding user response:
find -name 'test*.data' |
xargs awk -v n=$N -v t=$M -v a=$O '
BEGIN {
# lines to ignore
x[ i =(n 1) ]
x[ i =(t 1) ]
x[ i =(a 1) ]
# number of lines per file
i
}
!(FNR in x) {
printf("%s%s", $0, FNR<i?"|":"\n")
}
' >>out.csv
- define
N
,M
,O
appropriately - assumes data does not contain the separator character
|
- assumes
out.csv
is pre-filled with suitable header line
CodePudding user response:
Given file
:
name1
name2
name3
title1
title2
abstract1
abstract2
abstract3
abstract4
ID
then
awk '
BEGIN {FS = "\n"; RS = ""}
{
record = $1
for (i=2; i<=NF; i ) record = record "," $i
printf "%s%s", sep, record
sep = "|"
}
END {printf "\n"}
' file
outputs
name1,name2,name3|title1,title2|abstract1,abstract2,abstract3,abstract4|ID
This uses RS = ""
which treats sequences of blank lines as record separators, and FS = "\n"
which treats newlines as field separators.