I'm having sets of lists that contain the following data (in Python):
['425842', '2008', 'Monday', 23:30:00', '10']
['425843', '2008', 'Tuesday', 23:30:00', '9']
['425844', '2009', 'Monday', 23:30:00', '2']
['425845', '2009', 'Monday', 23:30:00', '3']
['425846', '2010', 'Monday', 23:30:00', '2']
['425847', '2010', 'Monday', 23:30:00', '10']
['425848', '2010', 'Tuesday', 23:30:00', '10']
I would like to calculate the average of the values of the last column (index 5) according to the year, example:
[2008, 9.5]
[2009, 2.5]
[2010, 7.3]
I tried to do it by zip function built-in of the Python, but this function is generated by interator. Could you help me with a solution?
CodePudding user response:
Use pandas
to group the data by year then take the mean of the value in column 5.
data = [
['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10'],
]
import pandas as pd
df = pd.DataFrame(data, columns=["id", "year", "day","time","value"])
df["value"] = pd.to_numeric(df["value"])
print(df.groupby("year")["value"].mean())
CodePudding user response:
zip
isn't helpful at all here; you probably want to build a dictionary to collect the totals for each year so you can average them.
data = [
['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10'],
]
year_totals = {year: [] for year in set(year for _, year, _, _, _ in data)}
for _, year, _, _, value in data:
year_totals[year].append(int(value))
averages = {y: sum(t) / len(t) for y, t in year_totals.items()}
print(averages) # {'2010': 7.333333333333333, '2008': 9.5, '2009': 2.5}
CodePudding user response:
This should work:
data = [['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10']]
sums = {}
for i in data:
if i[1] not in sums:
sums[i[1]] = [int(i[-1])]
else:
sums[i[1]].append(int(i[-1]))
sums = {i: sum(sums[i]) / len(sums[i]) for i in sums}
output = [[i, sums[i]] for i in sums]
Value of output
:
[['2008', 9.5], ['2009', 2.5], ['2010', 7.333333333333333]]
CodePudding user response:
You could use itertools.groupby
to group the lists by year and compute the average for each group:
data = [['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10']]
groups = {int(key): list(map(lambda x: int(x[4]), value)) for key, value in
itertools.groupby(data, lambda x: x[1])}
averages = {key: sum(value) / len(value) for key, value in groups.items()}