How to parse csv file into multiple csv based on row spacing-CodePudding

I'm trying to build a airflow DAG and need to split out 7 tables contained in one csv into seven separate csv's.

dataset1

header_a	header_b	header_c
One	Two	Three
One	Two	Three

                         <-Always two spaced rows between data sets

dataset N <-part of csv file giving details on data

header_d	header_e	header_f	header_g
One	Two	Three	Four
One	Two	Three	Four

out: dataset1.csv datasetn.csv

Based on my research i think my solution might lie in awk searching for the double spaces?

EDIT: In plain text as requested.

table1 details1,
table1 details2,
table1 details3,
header_a,header_b,header_c,
1,2,3
1,2,3


tableN details1,
tableN details2,
tableN details3,
header_a, header_b,header_c,header_N,
1,2,3,4
1,2,3,4

CodePudding user response：

Always two spaced rows between data sets

If your CSV file contains blank lines, and your goal is to write out each chunk of records that is separated by those blank lines into individual files, then you could use awk with its record separator RS set to nothing, which then defaults to treating each "paragraph" as a record. Each of them can then be redirected to a file whose name is based on the record number NR:

awk -vRS= '{print $0 > ("output_" NR ".csv")}' input.csv

This reads from input.csv and writes the chunks to output_1.csv, output_2.csv, output_3.csv and so forth.

If my interpretation of your input file's structure (or your problem in general) is wrong, please provide more detail to clarify.