I am working on calculating some football stats.
I have the following dataframe:
{'Player': {8: 'Darrel Williams', 2: 'Mark Ingram', 3: 'Michael Carter', 4: 'Najee Harris', 10: 'James Conner', 0: 'Buffalo Bills', 15: 'Davante Adams', 1: 'Aaron Rodgers', 5: 'Tyler Bass', 11: 'Corey Davis', 6: 'Van Jefferson', 14: 'Matt Ryan', 7: 'T.J. Hockenson', 9: 'Antonio Brown', 12: 'Alvin Kamara', 13: 'Tyler Boyd'}, 'Position': {8: 'RB', 2: 'RB', 3: 'RB', 4: 'RB', 10: 'RB', 0: 'DEF', 15: 'WR', 1: 'QB', 5: 'K', 11: 'WR', 6: 'WR', 14: 'QB', 7: 'TE', 9: 'WR', 12: 'RB', 13: 'WR'}, 'Score': {8: 24.9, 2: 18.8, 3: 16.2, 4: 15.3, 10: 13.9, 0: 12.0, 15: 11.3, 1: 10.48, 5: 9.0, 11: 8.8, 6: 6.9, 14: 1.68, 7: 0.0, 9: 0.0, 12: 0.0, 13: 0.0}}
Player | Position | Score |
---|---|---|
Darrel Williams | RB | 24.9 |
Mark Ingram | RB | 18.8 |
Michael Carter | RB | 16.2 |
Najee Harris | RB | 15.3 |
James Conner | RB | 13.9 |
Buffalo Bills | DEF | 12 |
Davante Adams | WR | 11.3 |
Aaron Rodgers | QB | 10.48 |
Tyler Bass | K | 9 |
Corey Davis | WR | 8.8 |
Van Jefferson | WR | 6.9 |
Matt Ryan | QB | 1.68 |
T.J. Hockenson | TE | 0 |
Antonio Brown | WR | 0 |
Alvin Kamara | RB | 0 |
Tyler Boyd | WR | 0 |
What I am looking to do, given the following requirements_dictionary
, is to extract the top value
(Score
in the dataframe) for each key
(Position
in the dataframe):
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
What makes this challenging, is that for the final key
, FLEX
, that matches to no position in the dataframe, because that value could be a position of: RB, WR,
or TE
.
Final output should look like:
Player | Position | Score |
---|---|---|
Darrel Williams | RB | 24.9 |
Mark Ingram | RB | 18.8 |
Michael Carter | RB | 16.2 |
Najee Harris | RB | 15.3 |
Buffalo Bills | DEF | 12 |
Davante Adams | WR | 11.3 |
Aaron Rodgers | QB | 10.48 |
Tyler Bass | K | 9 |
Corey Davis | WR | 8.8 |
T.J. Hockenson | TE | 0 |
Since that is the top 2 RB, 1 QB, 2 WR, 1 TE, 1 K, 1 DEF
and 2 FLEX
.
I have tried the following code which gets me close:
all_points.groupby('Position')['Score'].nlargest(2)
Position
DEF 0 12.00
K 5 9.00
QB 1 10.48
14 1.68
RB 8 24.90
2 18.80
TE 7 0.00
WR 15 11.30
11 8.80
Name: Score, dtype: float64
However, that does not account for the FLEX
"position"
I could alternatively loop through the dataframe and do this manually, but that seems very intensive.
How can I achieve the intended result?
CodePudding user response:
Create a custom function that select a number of players according your requirements for each group and keep this index as idx_best
. Then exclude all already selected players and select FLEX other players as idx_flex
. Finally extract the union of this two indexes.
FLEX = requirements_dictionary['FLEX']
select_players = lambda x: x.nlargest(requirements_dictionary[x.name])
idx_best = df.groupby('Position')['Score'].apply(select_players).index.levels[1]
idx_flex = df.loc[df.index.difference(idx_best), 'Score'].nlargest(FLEX).index
out = df.loc[idx_best.union(idx_flex)].sort_values('Score', ascending=False)
Output:
>>> out
Player Position Score
8 Darrel Williams RB 24.90
2 Mark Ingram RB 18.80
3 Michael Carter RB 16.20
4 Najee Harris RB 15.30
0 Buffalo Bills DEF 12.00
15 Davante Adams WR 11.30
1 Aaron Rodgers QB 10.48
5 Tyler Bass K 9.00
11 Corey Davis WR 8.80
7 T.J. Hockenson TE 0.00
CodePudding user response:
use the requirements dictionary to get the rows equal to a position then sort by score and get the head equal to the dictionary value for the position. Flex is top 2, for position in RB, WR, TE. I concatenate the flex results. my solution is more intuitive and logical to understand
txt="""Player,Position,Score
Darrel Williams,RB,24.9
Mark Ingram,RB,18.8
Michael Carter,RB,16.2
Najee Harris,RB,15.3
Buffalo Bills,DEF,12
Davante Adams,WR,11.3
Aaron Rodgers,QB,10.48
Tyler Bass,K,9
Corey Davis,WR,8.8
T.J. Hockenson,TE,0"""
df = pd.read_csv(io.StringIO(txt),sep=',')
requirements_dictionary = {'QB': 1, 'RB': 2, 'WR': 2, 'TE': 1, 'K': 1, 'DEF': 1, 'FLEX': 2}
#print(df)
df_top_rows = pd.DataFrame()
for position in requirements_dictionary.keys():
df_top_rows = df_top_rows.append(df[df['Position'] == position].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
print(df_top_rows)
position='FLEX'
df_flex_rows = df_top_rows.append(df[df['Position'].isin(['RB','WR','TE'])].sort_values(by='Score', ascending=False).head(requirements_dictionary[position]))
#print(df_flex_rows)
df_result=pd.concat([df_top_rows,df_flex_rows],axis=0)
df_result.drop_duplicates(inplace=True)
print(df_result)
output
Player Position Score
6 Aaron Rodgers QB 10.48
0 Darrel Williams RB 24.90
1 Mark Ingram RB 18.80
5 Davante Adams WR 11.30
8 Corey Davis WR 8.80
9 T.J. Hockenson TE 0.00
7 Tyler Bass K 9.00
4 Buffalo Bills DEF 12.00