Home > Enterprise >  How can I pull data from two rows in csv with one being the header and other the data?
How can I pull data from two rows in csv with one being the header and other the data?

Time:01-26

I have a CSV sheet with two rows of data in rows 4 and 5. Row 4 has the titles and row 5 has the corresponding data for it. How do I pull data and put it in another CSV sheet? The title row has formatting similar to

Title_XYZ[0].XXX_YYY_Record.XXX_YYY.AAA 
Title_XYZ[0].XXX_YYY_Record.XXX_YYY.BBB
Title_XYZ[1].XXX_YYY_Record.XXX_YYY.AAA 
Title_XYZ[1].XXX_YYY_Record.XXX_YYY.BBB
Title_XYZ[2].XXX_YYY_Record.XXX_YYY.AAA 
Title_XYZ[2].XXX_YYY_Record.XXX_YYY.BBB

with the number in the [] changing with every new cell? I can't use pandas.

I have tried reading the files using import CSV, pulling the file to read, and using

header = lines[4].split(',')

to grab data and write a new CSV file using that but that just copies and paste the data instead of assigning AAA with AAA and BBB with BBB.

The data looks something like this:

Title_XYZ[0].XXX_YYY_Record.XXX_YYY.AAA Title_XYZ[0].XXX_YYY_Record.XXX_YYY.BBB Title_XYZ[1].XXX_YYY_Record.XXX_YYY.AAA Title_XYZ[1].XXX_YYY_Record.XXX_YYY.BBB
12 13 14 15

And the output should look like

AAA BBB
12 13
14 15

CodePudding user response:

import pandas as pd
import re
data="""
Title_XYZ[0].XXX_YYY_Record.XXX_YYY.AAA Title_XYZ[0].XXX_YYY_Record.XXX_YYY.BBB Title_XYZ[1].XXX_YYY_Record.XXX_YYY.AAA Title_XYZ[1].XXX_YYY_Record.XXX_YYY.BBB
12  13  14  15
"""
records = []
regex = r"(?:(?P<AAA>\d )\s(?P<BBB>\d )){1,2}"
for match in re.finditer(regex, data):
        records.append(match.groupdict())

pd.DataFrame(records)

enter image description here

  • Related