Finding the index of the max scores and using that index to find corresponding gross and calculate m-CodePudding

So I have this dataset as follows:

Year,Score, Gross
1920,4,200
1920,5.5,150
1920,5.5,170
1930,5,300
1930,5,200
1930,4,140
1940,4,500
1940,4,650
1940,3,800
1940,4,700
1960,7,290
1960,5,250
1960,7,310

I am trying to find the max scores for each year that is present in the dataset and then obtain the corresponding gross for that score and then calculating the mean of all those gross. For instance, in the year 1960, there are two scores of 7 but different gross so I will need to find the index of those two scores of 7 and obtain their corresponding gross to use in my calculation.

So far, I have been trying group the years and find the max scores but only received an output of one max value within the year. My code is as follows:

df = pd.read_csv('test.csv')
idx = df.groupby('Year')['Score'].apply(max)

And the corresponding output is:

Please help, I am really stuck and do not know where to go on from here! Edit: a member of the community suggested that I try using transform but now I have got an output as follows:

I am still confused as to what this function is doing and what does this output mean?

CodePudding user response：

The question isn't as clear, however,lets try the index with the max score and then use that to filter using the bloc accessor. Happy to help if clarified

df2.iloc[df2.groupby('Year', as_index=False)['Score'].idxmax()['Score']]

    Year  Score  Gross
1   1920    5.5    150
3   1930    5.0    300
6   1940    4.0    500
10  1960    7.0    290

Based on your comments lets try filter twice.

#First filter the max score per year.

s=df2[df2['Score'].isin(df2.groupby('Year')['Score'].max().to_list())]

#Lastly, eliminate non duplicated values in the groups

s[s.duplicated(subset=['Year','Score'],keep=False)]



   Year  Score  Gross
1   1920    5.5    150
2   1920    5.5    170
3   1930    5.0    300
4   1930    5.0    200
6   1940    4.0    500
7   1940    4.0    650
9   1940    4.0    700
10  1960    7.0    290
12  1960    7.0    310

CodePudding user response：

Use transform:

idx = df.groupby('Year')['Score'].transform('max')