I have a dataset, df
that looks as follows:
Date | Code | City | State | Ranking |
---|---|---|---|---|
2020-01 | 10001 | Los Angeles | CA | 0.852 |
2020-02 | 10001 | Los Angeles | CA | 0.945 |
2020-03 | 10001 | Los Angeles | CA | 0.991 |
2020-01 | 20002 | Houston | TX | 0.134 |
2020-02 | 20002 | Houston | TX | 0.234 |
2020-03 | 20002 | Houston | TX | 0.667 |
... | ... | ... | ... | ... |
2021-07 | 10001 | Los Angeles | CA | 0.678 |
2021-07 | 20002 | Houston | TX | 0.721 |
I have multiple cities, each city containing a Ranking
that ranges from 2020-01
to 2021-07
. I want to create a new dataframe, where I take the average of each city's ranking over time. Essentially, my new data set would be:
Code | Average Ranking |
---|---|
10001 | 0.8665 |
20002 | 0.439 |
I have no idea how to extract my information. The closest thing I thought about doing was still not giving me the right output:
df_avg = df.groupby(['Code','Date'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})
How can I create this new data frame, df_avg
, with two columns, Code
and Average Ranking
, where Average Ranking
is the mean Ranking
for each Code
?
CodePudding user response:
Just remove Date
from your groupby key. In this case, you want the mean value of column Ranking
from all rows in each Code
column, so your groupby key should be only Code
.
df_avg = df.groupby(['Code'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})