As I wrote in the title I would like to read a CSV, do on this same CSV a group by column, apply sum, after replace the old CSV with the new values using as least libraries as possible (and avoid pandas). I have come this far:
index = {}
with open('event.csv') as f:
cr = reader(f)
for row in cr:
index.setdefault(row[0], []).append(int(row[1]))
f.close()
with open('event.csv', 'w', newline='\n') as csv_file:
writer = writer(csv_file)
for key, value in index.items():
writer.writerow([key, value[0]])
csv_file.close()
But in this way I can make the average…and also I have to open the file twice, which doesn't seem smart to me. Here is a CSV similar to event.csv
:
work1,100
work2,200
work3,200
work1,50
work3,20
Desired output:
work1,150
work2,200
work3,220
CodePudding user response:
You're actually very close. Just sum the values read while rewriting the file. Note that when using with
on a file, you don't have to explicitly close them, it does it for you automatically. Also note that CSV files should be opened with newline=''
—for reading and writing—as per the documentation.
import csv
index = {}
with open('event.csv', newline='') as f:
cr = csv.reader(f)
for row in cr:
index.setdefault(row[0], []).append(int(row[1]))
with open('event2.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
for key, values in index.items():
value = sum(values)
writer.writerow([key, value])
print('-fini-')
CodePudding user response:
Here's another way to think of it.
Instead of storing arrays of ints during reading and then "compressing" them into the desired value during writing, show up-front that you're summing something during the read:
import csv
from collections import defaultdict
summed_work = defaultdict(int)
with open('event_input.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
work_id = row[0]
work_value = int(row[1])
summed_work[work_id] = work_value
with open('event_processed.csv', 'w', newline='') as f:
writer = csv.writer(f)
for work_id, summed_value in summed_work.items():
writer.writerow([work_id, summed_value])
This is functionally equivalent to what you were aiming for and what martineau helped you with, but, I argue, shows you and your reader sooner and more clearly what the intent is.
It technically uses one more library, defaultdict, but that's a standard library, and I'm not sure what value you're placing on the number of libraries being used.
CodePudding user response:
Another simplification of solutions already shown, without additional libraries:
import csv
index = {}
with open('event.csv', newline='') as f:
cr = csv.reader(f)
for item,value in cr:
index[item] = index.get(item, 0) int(value) # sum as you go
with open('event2.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(index.items()) # write all the items in one shot
print('-fini-')
CodePudding user response:
With an additional library - convtools
, which provides a lot of functionality not to write a lot of code by hand every time.
from convtools import conversion as c
from convtools.contrib.tables import Table
rows = Table.from_csv("event.csv", header=False).into_iter_rows(list)
converter = (
c.group_by(c.item(0))
.aggregate(
(
c.item(0),
c.ReduceFuncs.Sum(c.item(1).as_type(int)),
)
)
.gen_converter()
)
processed_rows = converter(rows)
Table.from_rows(processed_rows, header=False).into_csv(
"event2.csv", include_header=False
)