I have a csv file with a value that increases with time for n cities, like this:
city,date,value
saopaulo,2020-01-01,5
riodejaneiro,2020-01-01,3
curitiba,2020-01-01,7
...
saopaulo,2020-05-01,31
riodejaneiro,2020-05-01,55
curitiba,2020-05-01,41
What I want to do is to calculate the moving average of the column "value", but for each "city" separately.
I loaded the csv into a pandas dataframe, but if I calculate df["value"].rolling(3)
, it will calculate the moving average but for all the cities together.
What I want is to create a new column with the moving average but for each city. I was thinking about groupby, but I don't know exactly how to implement this.
CodePudding user response:
You can groupby
:
df.groupby('city')['value'].rolling(3).mean()
To assign:
df['roll'] = df.groupby('city')['value'].rolling(3).mean().droplevel(0)
CodePudding user response:
Here you go:
def rolling_mean(group :pd.DataFrame) -> pd.DataFrame:
# Whatever operation you want to do with the cities.
# For each city group will be a dataframe of that city's rows without the city column
# I'm guessing you'd like to set the date as a sorted index
# and calculate your moving average based on that but if that's not case modify this function.
return group.set_index('date').sort_index().rolling(3).mean()
df.grouby("city").apply(rolling_mean) # Use .reset_index() if you don't need the multiindex.
CodePudding user response:
maybe doing this (supose your dataframe named df)
from collections import defaultdict
data =defaultdict(list)
for (place,date,value) in df.values:
data[place].append(value)
new_df = pd.DataFrame(dict(data))
and now you have a new dataframe with each city on a column , so you can apply your function on each column ( in a for loop)
saopaulo riodejaneiro curitiba
0 5 3 7
1 31 55 41