How do I read an input CSV two rows at a time, combine some values into a new single row, and then write that row to a new CSV?
In the input below I want to read two rows, take price1 from the second row and make it price2 in a new, combined, row, then repeat for the next two rows:
input
date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03
new csv
date, name, qt, price1, price2
9/12/22, AB, 2, 5.00, 5.08
9/12/22, BC, 1, 2.00, 2.03
import csv
data = []
with open('test.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
data.append({'date': row[0], 'qt': row[3], 'name': row[5], 'price': row[10]})
#data.append(myClass(row[0], row[2], row[3], row[5], row[10]))
for x in data:
print(x)
Adrian's Answer is missing:
- import csv
- #newline requires python 3
- csv must be UTF-8 and headers must match
CodePudding user response:
cat price_old.csv
date,name,qt,price1
9/12/22,AB,2,5.00
9/12/22,AB,2,5.08
9/12/22,BC,1,2.00
9/12/22,BC,1,2.03
with open("price_old.csv") as old_csv:
new_list = []
dt, nm = None, None
c_reader = csv.DictReader(old_csv)
for row in c_reader:
if row['date'] != dt and row['name'] != nm:
dt, nm = row['date'], row['name']
price_list = [row['price1']]
else:
price_list.append(row['price1'])
row.update({'price1': price_list[0], 'price2': price_list[1]})
new_list.append(row)
price_list = []
dt, nm = None, None
with open('price_new.csv', 'w', newline='') as new_csv:
c_writer = csv.DictWriter(new_csv, fieldnames= ['date', 'name','qt', 'price1', 'price2'])
c_writer.writeheader()
c_writer.writerows(new_list)
cat price_new.csv
date,name,qt,price1,price2
9/12/22,AB,2,5.00,5.08
9/12/22,BC,1,2.00,2.03
CodePudding user response:
Using pandas
will make your life easier.
import pandas as pd
df = pd.read_csv('test.csv', delimiter=', ')
# Group by the first 3 columns, and use the rows to form the columns (unstack)
df = df.groupby(['date','name','qt'])['price1'].apply(
lambda df: df.reset_index(drop=True)).unstack().reset_index()
# Rename the columns
df.columns = [*df.columns[:-2], 'price1', 'price2']
df.to_csv('output.csv', sep=',', index=False)
The output.csv
is:
date,name,qt,price1,price2
9/12/22,AB,2,5.0,5.08
9/12/22,BC,1,2.0,2.03
CodePudding user response:
Pythons csv readers can read any number of rows at a time with ease.
The readers (both csv.reader and csv.DictReader) are iterators, which means we can use the next() function to manually get a row, just like we can use elem = next(my_iterator)
to get the next element in any iterator:
import csv
import pprint
new_rows = []
with open("input.csv", newline="") as f:
reader = csv.reader(f)
header = next(reader)
new_rows.append(header [" price 2"])
row1 = next(reader)
row2 = next(reader)
new_rows.append(row1 row2[:-1])
row1 = next(reader)
row2 = next(reader)
new_rows.append(row1 row2[:-1])
pprint.pprint(new_rows)
Gives us:
[
['date', ' name', ' qt', ' price1', ' price 2'],
['9/12/22', ' AB', ' 2', ' 5.00', ' 5.08'],
['9/12/22', ' BC', ' 1', ' 2.00', ' 2.03']
]
That approach can be adapted to using a for-loop to drive the iteration. The for-loop will always give us the "first row" in the current group of rows we want. Inside the loop we ask for as many of the next rows as are in the group:
with open("input.csv", newline="") as f:
reader = csv.reader(f)
header = next(reader)
new_rows.append(header [" price 2"])
for row1 in reader:
row2 = next(reader)
new_rows.append(row1 row2[-1:])
This approach assumes your input has a header, and then after that the row count is an even multiple of the group size. If your input doesn't conform to that, you'll get a StopIteration exception, that'll look something like (in 3.11):
File "/Users/zyoung/develop/StackOverflow/./main.py", line 31, in <module>
row2 = next(reader)
^^^^^^^^^^^^
StopIteration
If that's a possibility, you can add exception handling:
for row1 in reader:
try:
row2 = next(reader)
except StopIteration as e:
print(f"encountered an odd row on line {reader.line_num}; stopping reading, moving on to writing")
break
Finally, we can extend this idea to any number of rows in a logical group, like 3 rows per group:
date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, AB, 2, 5.12
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03
9/12/22, BC, 1, 2.06
header = next(reader)
new_rows.append(header [" price 2", " price 3"])
for row1 in reader:
row2 = next(reader)
row3 = next(reader)
new_rows.append(row1 row2[-1:] row3[-1:])