Home > OS >  How to parse csv file into multiple csv based on row spacing
How to parse csv file into multiple csv based on row spacing

Time:02-24

I'm trying to build a airflow DAG and need to split out 7 tables contained in one csv into seven separate csv's.

dataset1

header_a header_b header_c
One Two Three
One Two Three
                         <-Always two spaced rows between data sets 

dataset N <-part of csv file giving details on data

header_d header_e header_f header_g
One Two Three Four
One Two Three Four

out: dataset1.csv datasetn.csv

Based on my research i think my solution might lie in awk searching for the double spaces?

EDIT: In plain text as requested.

table1 details1,
table1 details2,
table1 details3,
header_a,header_b,header_c,
1,2,3
1,2,3


tableN details1,
tableN details2,
tableN details3,
header_a, header_b,header_c,header_N,
1,2,3,4
1,2,3,4

CodePudding user response:

Always two spaced rows between data sets

If your CSV file contains blank lines, and your goal is to write out each chunk of records that is separated by those blank lines into individual files, then you could use awk with its record separator RS set to nothing, which then defaults to treating each "paragraph" as a record. Each of them can then be redirected to a file whose name is based on the record number NR:

awk -vRS= '{print $0 > ("output_" NR ".csv")}' input.csv

This reads from input.csv and writes the chunks to output_1.csv, output_2.csv, output_3.csv and so forth.

If my interpretation of your input file's structure (or your problem in general) is wrong, please provide more detail to clarify.

  • Related