How to enter and average values from a list using zip in Python-CodePudding

I'm having sets of lists that contain the following data (in Python):

['425842', '2008', 'Monday', 23:30:00', '10'] 
['425843', '2008', 'Tuesday', 23:30:00', '9'] 
['425844', '2009', 'Monday', 23:30:00', '2'] 
['425845', '2009', 'Monday', 23:30:00', '3'] 
['425846', '2010', 'Monday', 23:30:00', '2'] 
['425847', '2010', 'Monday', 23:30:00', '10'] 
['425848', '2010', 'Tuesday', 23:30:00', '10']

I would like to calculate the average of the values of the last column (index 5) according to the year, example:

[2008, 9.5]
[2009, 2.5]
[2010, 7.3]

I tried to do it by zip function built-in of the Python, but this function is generated by interator. Could you help me with a solution?

CodePudding user response：

Use pandas to group the data by year then take the mean of the value in column 5.

data = [
    ['425842', '2008', 'Monday', '23:30:00', '10'], 
    ['425843', '2008', 'Tuesday', '23:30:00', '9'], 
    ['425844', '2009', 'Monday', '23:30:00', '2'], 
    ['425845', '2009', 'Monday', '23:30:00', '3'], 
    ['425846', '2010', 'Monday', '23:30:00', '2'], 
    ['425847', '2010', 'Monday', '23:30:00', '10'], 
    ['425848', '2010', 'Tuesday', '23:30:00', '10'],
]
import pandas as pd
df = pd.DataFrame(data, columns=["id", "year", "day","time","value"])
df["value"] = pd.to_numeric(df["value"])
print(df.groupby("year")["value"].mean())

CodePudding user response：

zip isn't helpful at all here; you probably want to build a dictionary to collect the totals for each year so you can average them.

data = [
    ['425842', '2008', 'Monday', '23:30:00', '10'], 
    ['425843', '2008', 'Tuesday', '23:30:00', '9'], 
    ['425844', '2009', 'Monday', '23:30:00', '2'], 
    ['425845', '2009', 'Monday', '23:30:00', '3'], 
    ['425846', '2010', 'Monday', '23:30:00', '2'], 
    ['425847', '2010', 'Monday', '23:30:00', '10'], 
    ['425848', '2010', 'Tuesday', '23:30:00', '10'],
]

year_totals = {year: [] for year in set(year for _, year, _, _, _ in data)}
for _, year, _, _, value in data:
    year_totals[year].append(int(value))

averages = {y: sum(t) / len(t) for y, t in year_totals.items()}

print(averages)  # {'2010': 7.333333333333333, '2008': 9.5, '2009': 2.5}

CodePudding user response：

This should work:

data = [['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'], 
['425847', '2010', 'Monday', '23:30:00', '10'], 
['425848', '2010', 'Tuesday', '23:30:00', '10']]
sums = {}
for i in data:
    if i[1] not in sums:
        sums[i[1]] = [int(i[-1])]
    else:
        sums[i[1]].append(int(i[-1]))
sums = {i: sum(sums[i]) / len(sums[i]) for i in sums}
output = [[i, sums[i]] for i in sums]

Value of output:

[['2008', 9.5], ['2009', 2.5], ['2010', 7.333333333333333]]

CodePudding user response：

You could use itertools.groupby to group the lists by year and compute the average for each group:

data = [['425842', '2008', 'Monday', '23:30:00', '10'],
        ['425843', '2008', 'Tuesday', '23:30:00', '9'],
        ['425844', '2009', 'Monday', '23:30:00', '2'],
        ['425845', '2009', 'Monday', '23:30:00', '3'],
        ['425846', '2010', 'Monday', '23:30:00', '2'],
        ['425847', '2010', 'Monday', '23:30:00', '10'],
        ['425848', '2010', 'Tuesday', '23:30:00', '10']]

groups = {int(key): list(map(lambda x: int(x[4]), value)) for key, value in
          itertools.groupby(data, lambda x: x[1])}

averages = {key: sum(value) / len(value) for key, value in groups.items()}