I have a dataframe like this structure (in the real one there are more columns Game x, around 30, but for explaining I think it's ok with these 2 columns):
Name Game 1 Game 2
0 Player 1 Starting 68 Starting
1 Player 2 Bench 74 Starting 80
2 Player 3 Starting Bench
3 Player 4 Bench Bench 50
4 Player 5 NaN Starting
I need new columns for counting the minutes of any player in the columns "Game x" based in these conditions:
- Starting: means the player has played 90 minutes
- Starting 68 (or whichever): means the player has played 68 minutes (or whichever)
- Bench and NaN: means the player has played 0 minutes
- Bench 74 (or whichever): means the player has played 16 minutes (the total is 90 so he started at the minute 74 and then is 90 - 74 = 16)
There would be 2 columns counting the number of the minutes the player has played when he started the game and when he entered the game from the bench.
The final dataframe would be:
Name Game 1 Game 2 Minutes Starting Minutes Bench
0 Player 1 Starting 68 Starting 158 0
1 Player 2 Bench 74 Starting 80 80 16
2 Player 3 Starting Bench 90 0
3 Player 4 Bench Bench 50 0 40
4 Player 5 NaN Starting 60 60 0
Any suggestion about how to do it? Thanks in advance!
CodePudding user response:
If you write a function that parses a text field and returns the corresponding number of minutes, you can apply that function to each game column and add up the results. For example, the time played from start:
def played_from_start(entry):
entry = str(entry) # Without this, np.nan is a float.
if entry == 'nan' or entry == '':
return 0
if entry.startswith('Bench'):
return 0
if entry == 'Starting':
return 90
if entry.startswith('Starting'):
return int(entry[9:])
print(f"Warning: Entry '{entry}' not recognized.")
return np.nan
games = ['Game 1', 'Game 2']
df['Minutes Starting'] = np.sum(np.array([df[game].apply(played_from_start).values
for game in games]),
axis=0)