Name,A,B,C,D
Fred,1,3,4,3
Adam,1,4,1,2
Moo,4,2,3,2
So I want to sum this csv input by each column and then find the average before putting it in a list.
So for the above sample input, the list would be ['2','3','2.667','2.333'] for A,B,C,D respectively. I thought of creating a list for each column and then finding the average before appending it to a pre-created list, but it didn't work.
import csv
line_count = 0
first_line = True
final_list = []
A = []
B = []
C = []
D = []
for row in open("file.csv"):
values = row.strip().split(",")
if first_line:
first_line = False
else:
line_count = 1
A.append(int(values[1]))
sum_of_A = sum(A)
average = [sum_of_A/line_count]
data.append(average)
But then after this, since the last line returns to the normal indentation space, I'm not sure how to return back into the loop to continue finding the average for the rest of the columns. I need it to be put into a list so I can rewrite it using the csv module into a new file, so any guidance is helpful.
CodePudding user response:
You can compute the averages as you go using a list comprehension that zips the values with the averages you have so far. Use the first line (column titles) as an initialization step since you have to skip it anyway.
with open('file.csv') as f:
for i,(_,*values) in enumerate(csv.reader(f)):
if i: averages = [a (float(v)-a)/i for a,v in zip(averages,values)]
else: averages = [0]*len(values)
print(averages)
[2.0, 3.0, 2.6666666666666665, 2.3333333333333335]
If you're going to load the whole table in memory, you can get the averages from the nested list after loading it:
with open('file.csv') as f:
data = list(csv.reader(f))
averages = [ sum(map(float,c))/len(c)
for i,c in enumerate(zip(*data[1:])) if i ]
print(averages)
[2.0, 3.0, 2.6666666666666665, 2.3333333333333335]
CodePudding user response:
You can just read in the file (without csv
), split and sum:
with open("file.csv") as fid:
# remove the "\n" from the end of each line, break line into list,
# and remove first column
data = [line.rstrip().split(",")[1:] for line in fid.readlines()]
data.pop(0) # remove first line (header)
n_rows = len(data)
n_columns = len(data[0])
summation = [(sum(int(row[column]) for row in data)) \
for column in range(n_columns)]
average = [s / n_rows for s in summation]
summation = [6, 9, 8, 7]
average = [2.0, 3.0, 2.6666666666666665, 2.3333333333333335]
CodePudding user response:
You can do everything in one loop.
import csv
line_count = 0
first_line = True
final_list = []
A = []
B = []
C = []
D = []
with open('test.csv') as input_file:
for row in input_file:
values = row.strip().split(",")
if first_line:
first_line = False
else:
line_count = 1
A.append(int(values[1]))
B.append(int(values[2]))
C.append(int(values[3]))
D.append(int(values[4]))
a_average = sum(A) / line_count
b_average = sum(B) / line_count
c_average = sum(C) / line_count
d_average = sum(D) / line_count
Now you have all the averages, you can do whatever you want.