Sequentially extract top n from group depending on group value-CodePudding

I am working on calculating some football stats.

I have the following dataframe:

{'Player': {8: 'Darrel Williams',  2: 'Mark Ingram',  3: 'Michael Carter',  4: 'Najee Harris',  10: 'James Conner',  0: 'Buffalo Bills',  15: 'Davante Adams',  1: 'Aaron Rodgers',  5: 'Tyler Bass',  11: 'Corey Davis',  6: 'Van Jefferson',  14: 'Matt Ryan',  7: 'T.J. Hockenson',  9: 'Antonio Brown',  12: 'Alvin Kamara',  13: 'Tyler Boyd'}, 'Position': {8: 'RB',  2: 'RB',  3: 'RB',  4: 'RB',  10: 'RB',  0: 'DEF',  15: 'WR',  1: 'QB',  5: 'K',  11: 'WR',  6: 'WR',  14: 'QB',  7: 'TE',  9: 'WR',  12: 'RB',  13: 'WR'}, 'Score': {8: 24.9,  2: 18.8,  3: 16.2,  4: 15.3,  10: 13.9,  0: 12.0,  15: 11.3,  1: 10.48,  5: 9.0,  11: 8.8,  6: 6.9,  14: 1.68,  7: 0.0,  9: 0.0,  12: 0.0,  13: 0.0}}

Player	Position	Score
Darrel Williams	RB	24.9
Mark Ingram	RB	18.8
Michael Carter	RB	16.2
Najee Harris	RB	15.3
James Conner	RB	13.9
Buffalo Bills	DEF	12
Davante Adams	WR	11.3
Aaron Rodgers	QB	10.48
Tyler Bass	K	9
Corey Davis	WR	8.8
Van Jefferson	WR	6.9
Matt Ryan	QB	1.68
T.J. Hockenson	TE	0
Antonio Brown	WR	0
Alvin Kamara	RB	0
Tyler Boyd	WR	0

What I am looking to do, given the following requirements_dictionary, is to extract the top value (Score in the dataframe) for each key (Position in the dataframe):

requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}

What makes this challenging, is that for the final key, FLEX, that matches to no position in the dataframe, because that value could be a position of: RB, WR, or TE.

Final output should look like:

Player	Position	Score
Darrel Williams	RB	24.9
Mark Ingram	RB	18.8
Michael Carter	RB	16.2
Najee Harris	RB	15.3
Buffalo Bills	DEF	12
Davante Adams	WR	11.3
Aaron Rodgers	QB	10.48
Tyler Bass	K	9
Corey Davis	WR	8.8
T.J. Hockenson	TE	0

Since that is the top 2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF and 2 FLEX.

I have tried the following code which gets me close:

all_points.groupby('Position')['Score'].nlargest(2)

Position    
DEF       0     12.00
K         5      9.00
QB        1     10.48
          14     1.68
RB        8     24.90
          2     18.80
TE        7      0.00
WR        15    11.30
          11     8.80
Name: Score, dtype: float64

However, that does not account for the FLEX "position"

I could alternatively loop through the dataframe and do this manually, but that seems very intensive.

How can I achieve the intended result?

CodePudding user response：

Create a custom function that select a number of players according your requirements for each group and keep this index as idx_best. Then exclude all already selected players and select FLEX other players as idx_flex. Finally extract the union of this two indexes.

FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])

idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index

out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)

Output:

>>> out
             Player Position  Score
8   Darrel Williams       RB  24.90
2       Mark Ingram       RB  18.80
3    Michael Carter       RB  16.20
4      Najee Harris       RB  15.30
0     Buffalo Bills      DEF  12.00
15    Davante Adams       WR  11.30
1     Aaron Rodgers       QB  10.48
5        Tyler Bass        K   9.00
11      Corey Davis       WR   8.80
7    T.J. Hockenson       TE   0.00

CodePudding user response：

use the requirements dictionary to get the rows equal to a position then sort by score and get the head equal to the dictionary value for the position. Flex is top 2, for position in RB, WR, TE. I concatenate the flex results. my solution is more intuitive and logical to understand

txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""

df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
    df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)

position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))

#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)

output

       Player Position  Score
6    Aaron Rodgers       QB  10.48
0  Darrel Williams       RB  24.90
1      Mark Ingram       RB  18.80
5    Davante Adams       WR  11.30
8      Corey Davis       WR   8.80
9   T.J. Hockenson       TE   0.00
7       Tyler Bass        K   9.00
4    Buffalo Bills      DEF  12.00