convert txt with empty lines in csv-CodePudding

I have many files like the following:

cat test.data
name1
...
nameN

title1
...
titleM

abstract1
...
abstractO

ID

where the numbers N,M,O differ from file to file. But in all files the fields are separated by empty lines. I want to transform these data into csv, each file into a line with the rows (name, title, abtract, ID) like this:

name1 ...nameN|title1 ... titleM|abstract1 ... abstractO|ID

I have tried with awk and sed, but failed. Any suggestions would be helpful. Thanks in advance.

CodePudding user response：

find -name 'test*.data' |
xargs awk -v n=$N -v t=$M -v a=$O '
   BEGIN {
      # lines to ignore
      x[ i =(n 1) ]
      x[ i =(t 1) ]
      x[ i =(a 1) ]

      # number of lines per file
        i
   }
   !(FNR in x) {
      printf("%s%s", $0, FNR<i?"|":"\n")
   }
' >>out.csv

define N, M, O appropriately
assumes data does not contain the separator character |
assumes out.csv is pre-filled with suitable header line

CodePudding user response：

Given file:

name1
name2
name3

title1
title2

abstract1
abstract2
abstract3
abstract4

ID

then

awk '
  BEGIN {FS = "\n"; RS = ""}
  {
    record = $1
    for (i=2; i<=NF; i  ) record = record "," $i
    printf "%s%s", sep, record
    sep = "|"
  }
  END {printf "\n"}
' file

outputs

name1,name2,name3|title1,title2|abstract1,abstract2,abstract3,abstract4|ID

This uses RS = "" which treats sequences of blank lines as record separators, and FS = "\n" which treats newlines as field separators.