Home > Software engineering >  Sequentially extract top n from group depending on group value
Sequentially extract top n from group depending on group value

Time:11-20

I am working on calculating some football stats.

I have the following dataframe:

{'Player': {8: 'Darrel Williams',  2: 'Mark Ingram',  3: 'Michael Carter',  4: 'Najee Harris',  10: 'James Conner',  0: 'Buffalo Bills',  15: 'Davante Adams',  1: 'Aaron Rodgers',  5: 'Tyler Bass',  11: 'Corey Davis',  6: 'Van Jefferson',  14: 'Matt Ryan',  7: 'T.J. Hockenson',  9: 'Antonio Brown',  12: 'Alvin Kamara',  13: 'Tyler Boyd'}, 'Position': {8: 'RB',  2: 'RB',  3: 'RB',  4: 'RB',  10: 'RB',  0: 'DEF',  15: 'WR',  1: 'QB',  5: 'K',  11: 'WR',  6: 'WR',  14: 'QB',  7: 'TE',  9: 'WR',  12: 'RB',  13: 'WR'}, 'Score': {8: 24.9,  2: 18.8,  3: 16.2,  4: 15.3,  10: 13.9,  0: 12.0,  15: 11.3,  1: 10.48,  5: 9.0,  11: 8.8,  6: 6.9,  14: 1.68,  7: 0.0,  9: 0.0,  12: 0.0,  13: 0.0}}
Player Position Score
Darrel Williams RB 24.9
Mark Ingram RB 18.8
Michael Carter RB 16.2
Najee Harris RB 15.3
James Conner RB 13.9
Buffalo Bills DEF 12
Davante Adams WR 11.3
Aaron Rodgers QB 10.48
Tyler Bass K 9
Corey Davis WR 8.8
Van Jefferson WR 6.9
Matt Ryan QB 1.68
T.J. Hockenson TE 0
Antonio Brown WR 0
Alvin Kamara RB 0
Tyler Boyd WR 0

What I am looking to do, given the following requirements_dictionary, is to extract the top value (Score in the dataframe) for each key (Position in the dataframe):

requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}

What makes this challenging, is that for the final key, FLEX, that matches to no position in the dataframe, because that value could be a position of: RB, WR, or TE.

Final output should look like:

Player Position Score
Darrel Williams RB 24.9
Mark Ingram RB 18.8
Michael Carter RB 16.2
Najee Harris RB 15.3
Buffalo Bills DEF 12
Davante Adams WR 11.3
Aaron Rodgers QB 10.48
Tyler Bass K 9
Corey Davis WR 8.8
T.J. Hockenson TE 0

Since that is the top 2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF and 2 FLEX.

I have tried the following code which gets me close:

all_points.groupby('Position')['Score'].nlargest(2)

Position    
DEF       0     12.00
K         5      9.00
QB        1     10.48
          14     1.68
RB        8     24.90
          2     18.80
TE        7      0.00
WR        15    11.30
          11     8.80
Name: Score, dtype: float64

However, that does not account for the FLEX "position"

I could alternatively loop through the dataframe and do this manually, but that seems very intensive.

How can I achieve the intended result?

CodePudding user response:

Create a custom function that select a number of players according your requirements for each group and keep this index as idx_best. Then exclude all already selected players and select FLEX other players as idx_flex. Finally extract the union of this two indexes.

FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])

idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index

out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)

Output:

>>> out
             Player Position  Score
8   Darrel Williams       RB  24.90
2       Mark Ingram       RB  18.80
3    Michael Carter       RB  16.20
4      Najee Harris       RB  15.30
0     Buffalo Bills      DEF  12.00
15    Davante Adams       WR  11.30
1     Aaron Rodgers       QB  10.48
5        Tyler Bass        K   9.00
11      Corey Davis       WR   8.80
7    T.J. Hockenson       TE   0.00

CodePudding user response:

use the requirements dictionary to get the rows equal to a position then sort by score and get the head equal to the dictionary value for the position. Flex is top 2, for position in RB, WR, TE. I concatenate the flex results. my solution is more intuitive and logical to understand

txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""

df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
    df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)

position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))

#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)

output

       Player Position  Score
6    Aaron Rodgers       QB  10.48
0  Darrel Williams       RB  24.90
1      Mark Ingram       RB  18.80
5    Davante Adams       WR  11.30
8      Corey Davis       WR   8.80
9   T.J. Hockenson       TE   0.00
7       Tyler Bass        K   9.00
4    Buffalo Bills      DEF  12.00
  • Related