Calculating the moving average for each unique value in a column-CodePudding

I have a csv file with a value that increases with time for n cities, like this:

city,date,value
saopaulo,2020-01-01,5
riodejaneiro,2020-01-01,3
curitiba,2020-01-01,7
...
saopaulo,2020-05-01,31
riodejaneiro,2020-05-01,55
curitiba,2020-05-01,41

What I want to do is to calculate the moving average of the column "value", but for each "city" separately.

I loaded the csv into a pandas dataframe, but if I calculate df["value"].rolling(3), it will calculate the moving average but for all the cities together.

What I want is to create a new column with the moving average but for each city. I was thinking about groupby, but I don't know exactly how to implement this.

CodePudding user response：

You can groupby:

df.groupby('city')['value'].rolling(3).mean()

To assign:

df['roll'] = df.groupby('city')['value'].rolling(3).mean().droplevel(0)

CodePudding user response：

Here you go:

def rolling_mean(group :pd.DataFrame) -> pd.DataFrame:
    # Whatever operation you want to do with the cities.
    # For each city group will be a dataframe of that city's rows without the city column
    # I'm guessing you'd like to set the date as a sorted index
    # and calculate your moving average based on that but if that's not case modify this function.
    return group.set_index('date').sort_index().rolling(3).mean()

df.grouby("city").apply(rolling_mean) # Use .reset_index() if you don't need the multiindex.

CodePudding user response：

maybe doing this (supose your dataframe named df)

from collections import defaultdict

data =defaultdict(list)

for (place,date,value) in df.values: 
    data[place].append(value)

new_df = pd.DataFrame(dict(data))

and now you have a new dataframe with each city on a column , so you can apply your function on each column ( in a for loop)

   saopaulo riodejaneiro    curitiba
0   5          3              7
1   31         55             41