Read a CSV two rows at a time, combine values, write to new CSV-CodePudding

How do I read an input CSV two rows at a time, combine some values into a new single row, and then write that row to a new CSV?

In the input below I want to read two rows, take price1 from the second row and make it price2 in a new, combined, row, then repeat for the next two rows:

input

date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03

new csv

date, name, qt, price1, price2
9/12/22, AB, 2, 5.00, 5.08
9/12/22, BC, 1, 2.00, 2.03

import csv

data = []

with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        data.append({'date': row[0],  'qt': row[3], 'name': row[5], 'price':  row[10]})

 #data.append(myClass(row[0], row[2], row[3],  row[5],  row[10]))

for x in data:
    print(x)

Adrian's Answer is missing:

import csv
#newline requires python 3
csv must be UTF-8 and headers must match

CodePudding user response：

cat price_old.csv                                                                                                                                                         
date,name,qt,price1
9/12/22,AB,2,5.00
9/12/22,AB,2,5.08
9/12/22,BC,1,2.00
9/12/22,BC,1,2.03 

with open("price_old.csv") as old_csv:
    new_list = []
    dt, nm = None, None
    c_reader = csv.DictReader(old_csv)
    for row in c_reader:
        if row['date'] != dt and row['name'] != nm:
            dt, nm = row['date'], row['name']
            price_list = [row['price1']]
        else:
            price_list.append(row['price1'])
            row.update({'price1': price_list[0], 'price2': price_list[1]})
            new_list.append(row)
            price_list = []
            dt, nm = None, None
    with open('price_new.csv', 'w', newline='') as new_csv:
        c_writer = csv.DictWriter(new_csv, fieldnames= ['date', 'name','qt', 'price1', 'price2'])
        c_writer.writeheader()
        c_writer.writerows(new_list)

cat price_new.csv

date,name,qt,price1,price2
9/12/22,AB,2,5.00,5.08
9/12/22,BC,1,2.00,2.03

CodePudding user response：

Using pandas will make your life easier.

import pandas as pd

df = pd.read_csv('test.csv', delimiter=', ')

# Group by the first 3 columns, and use the rows to form the columns (unstack)
df = df.groupby(['date','name','qt'])['price1'].apply(
    lambda df: df.reset_index(drop=True)).unstack().reset_index()

# Rename the columns
df.columns = [*df.columns[:-2], 'price1', 'price2']

df.to_csv('output.csv', sep=',', index=False)

The output.csv is:

date,name,qt,price1,price2
9/12/22,AB,2,5.0,5.08
9/12/22,BC,1,2.0,2.03

CodePudding user response：

Pythons csv readers can read any number of rows at a time with ease.

The readers (both csv.reader and csv.DictReader) are iterators, which means we can use the next() function to manually get a row, just like we can use elem = next(my_iterator) to get the next element in any iterator:

import csv
import pprint

new_rows = []
with open("input.csv", newline="") as f:
    reader = csv.reader(f)

    header = next(reader)
    new_rows.append(header   [" price 2"])

    row1 = next(reader)
    row2 = next(reader)
    new_rows.append(row1   row2[:-1])

    row1 = next(reader)
    row2 = next(reader)
    new_rows.append(row1   row2[:-1])

pprint.pprint(new_rows)

Gives us:

[
    ['date',    ' name', ' qt', ' price1', ' price 2'],
    ['9/12/22',   ' AB',  ' 2',   ' 5.00',    ' 5.08'],
    ['9/12/22',   ' BC',  ' 1',   ' 2.00',    ' 2.03']
]

That approach can be adapted to using a for-loop to drive the iteration. The for-loop will always give us the "first row" in the current group of rows we want. Inside the loop we ask for as many of the next rows as are in the group:

with open("input.csv", newline="") as f:
    reader = csv.reader(f)

    header = next(reader)
    new_rows.append(header   [" price 2"])

    for row1 in reader:
        row2 = next(reader)
        new_rows.append(row1   row2[-1:])

This approach assumes your input has a header, and then after that the row count is an even multiple of the group size. If your input doesn't conform to that, you'll get a StopIteration exception, that'll look something like (in 3.11):

  File "/Users/zyoung/develop/StackOverflow/./main.py", line 31, in <module>
    row2 = next(reader)
           ^^^^^^^^^^^^
StopIteration

If that's a possibility, you can add exception handling:

for row1 in reader:
    try:
        row2 = next(reader)
    except StopIteration as e:
        print(f"encountered an odd row on line {reader.line_num}; stopping reading, moving on to writing")
        break

Finally, we can extend this idea to any number of rows in a logical group, like 3 rows per group:

date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, AB, 2, 5.12
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03
9/12/22, BC, 1, 2.06

header = next(reader)
new_rows.append(header   [" price 2", " price 3"])

for row1 in reader:
    row2 = next(reader)
    row3 = next(reader)

    new_rows.append(row1   row2[-1:]   row3[-1:])