Home > front end >  How to create a new dataframe with an average of panel data with different IDs and times?
How to create a new dataframe with an average of panel data with different IDs and times?

Time:05-28

I have a dataset, df that looks as follows:

Date Code City State Ranking
2020-01 10001 Los Angeles CA 0.852
2020-02 10001 Los Angeles CA 0.945
2020-03 10001 Los Angeles CA 0.991
2020-01 20002 Houston TX 0.134
2020-02 20002 Houston TX 0.234
2020-03 20002 Houston TX 0.667
... ... ... ... ...
2021-07 10001 Los Angeles CA 0.678
2021-07 20002 Houston TX 0.721

I have multiple cities, each city containing a Ranking that ranges from 2020-01 to 2021-07. I want to create a new dataframe, where I take the average of each city's ranking over time. Essentially, my new data set would be:

Code Average Ranking
10001 0.8665
20002 0.439

I have no idea how to extract my information. The closest thing I thought about doing was still not giving me the right output:

df_avg = df.groupby(['Code','Date'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})


How can I create this new data frame, df_avg, with two columns, Code and Average Ranking, where Average Ranking is the mean Ranking for each Code?

CodePudding user response:

Just remove Date from your groupby key. In this case, you want the mean value of column Ranking from all rows in each Code column, so your groupby key should be only Code.

df_avg = df.groupby(['Code'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})
  • Related