I hope you are doing well in the current situation
I've the following dataFrame as an input:
df_0 = pd.DataFrame({"year" : [1960, 1960, 1960, 1960, 1961, 1961, 1961, 1962, 1962, 1962,],
"genre": ['Action', 'Crime', 'Action', 'Drama', 'Thriller', 'Thriller', 'Crime', 'Drama', 'Drama', 'Thriller'],
"popularity": [1.99, 0.53, 1.81, 0.23, 3.86, 3.94, 0.21, 4.30, 5.60, 0.09] })
figure 0:
year genre popularity
0 1960 Action 1.99
1 1960 Crime 0.53
2 1960 Action 1.81
3 1960 Drama 0.23
4 1961 Thriller 3.86
5 1961 Thriller 3.94
6 1961 Crime 0.21
7 1962 Drama 4.30
8 1962 Drama 5.60
9 1962 Thriller 0.09
I've created a new dataFrame df_1
by grouping by values like this:
df_1 = df_0.groupby(['year','genre']).popularity.agg(['mean','max'])
figure 1:
mean max
year genre
1960 Action 1.90 1.99
Crime 0.53 0.53
Drama 0.23 0.23
1961 Crime 0.21 0.21
Thriller 3.90 3.94
1962 Drama 4.95 5.60
Thriller 0.09 0.09
As a result, we've got a similar dataFrame as the following:
df_1 = pd.DataFrame({"year" : [1960, 1960, 1960, 1961, 1961, 1962, 1962,],
"genre": ['Action', 'Crime', 'Drama', 'Crime', 'Thriller', 'Drama', 'Thriller'],
"mean ": [1.90, 0.53, 0.23, 0.21, 3.90, 4.95, 0.09],
"max" : [1.99, 0.53, 0.23, 0.21, 3.94, 5.60, 0.09] }).set_index("year")
And I'm struggling with the next steps.
I would like to create the following dataFrame df_2
from df_1
(.groupby()) using only pandas functions (and no numpy or at the minimum):
df_2 = pd.DataFrame({"year" : [1960, 1961, 1962],
"genre": ['Action', 'Thriller', 'Drama'],
"mean ": [1.90, 3.90, 4.95],
"max" : [1.99, 3.94, 5.60] }).set_index("year")
figure 2:
genre mean max
year
1960 Action 1.90 1.99
1961 Thriller 3.90 3.94
1962 Drama 4.95 5.60
This dataFrame df_2
collects the maximum values of each group.
Any tips?
Thank you for your support.
Stay safe
CodePudding user response:
You could try the following:
import pandas as pd
# querying the results you want from df_1 and reseting index to turn
# year and genre into columns
df_2 = df_1.query('year in [1960, 1961] and genre in ["Action", "Thriller"]').reset_index()
The result will look like this:
year genre mean max
0 1960 Action 1.9 1.99
1 1961 Thriller 3.9 3.94