I have ticker data about sports matches of the type:
playtime event game
0 00:10:34 'red card for player 10' game_id_0123
1 00:00:00 'start of second half' game_id_0123
2 00:10:00 'goal for team blue' game_id_0123
What is the easiest way using pandas (python) to add, say 45 minutes to the playtime
for each event
after the start of the second half for every game
?
I am considering using the index
and doing a (group by) condition for each game id. But that sounds overly complex, and I fear I am missing a feature in pandas that would allow me to do that more easily or in a standard matter.
CodePudding user response:
Example
data = {'playtime': {0: '00:10:34',
1: '00:00:00',
2: '00:10:00',
3: '00:40:00',
4: '00:00:00',
5: '00:05:00'},
'event': {0: 'red card for player 10',
1: 'start of second half',
2: 'goal for team blue',
3: 'goal for team red',
4: 'start of second half',
5: 'assist for team red'},
'game': {0: 'game_id_0123',
1: 'game_id_0123',
2: 'game_id_0123',
3: 'game_id_0124',
4: 'game_id_0124',
5: 'game_id_0124'}}
df = pd.DataFrame(data)
df
playtime event game
0 00:10:34 red card for player 10 game_id_0123
1 00:00:00 start of second half game_id_0123
2 00:10:00 goal for team blue game_id_0123
3 00:40:00 goal for team red game_id_0124
4 00:00:00 start of second half game_id_0124
5 00:05:00 assist for team red game_id_0124
Code
make series with 0(first half) and 1(second half)
event = 'start of second half'
df['event'].eq(event).groupby(df['game']).cumsum()
output:
0 0
1 1
2 1
3 0
4 1
5 1
Name: event, dtype: int64
make condition(cond1
) from output, add 45min if 1
cond1 = df['event'].eq(event).groupby(df['game']).cumsum().eq(1)
df['playtime'] = pd.to_datetime(df['playtime'])
df['playtime'] = df['playtime'].mask(cond1, df['playtime'] pd.Timedelta('45m')).dt.time
df
playtime event game
0 00:10:34 red card for player 10 game_id_0123
1 00:45:00 start of second half game_id_0123
2 00:55:00 goal for team blue game_id_0123
3 00:40:00 goal for team red game_id_0124
4 00:45:00 start of second half game_id_0124
5 00:50:00 assist for team red game_id_0124
CodePudding user response:
Not sure if this can be any easier:
Data
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"playtime": [
"00:10:34",
"00:00:00",
"00:10:00",
"00:10:34",
"00:00:00",
"00:10:00",
"00:12:00"],
"event": [
'red card for player 10',
'start of second half',
'goal for team blue',
'red card for player 12',
'start of second half',
'goal for team red',
'yellow card for player 1'],
"game": [
"game_id_0123",
"game_id_0123",
"game_id_0123",
"game_id_0124",
"game_id_0124",
"game_id_0124",
"game_id_0124"]})
Mask when you need to add 45 minutes
df["mask"] = np.where(
df["event"].eq('start of second half'),
1,
np.nan)
first_event_per_game = df.groupby("game")\
.apply(lambda x: x.index[0]).values
df["mask"] = np.where(
df.index.isin(first_event_per_game),
0,
df["mask"])
df["mask"] = df["mask"].ffill()
which now looks like
playtime event game mask
0 00:10:34 red card for player 10 game_id_0123 0.0
1 00:00:00 start of second half game_id_0123 1.0
2 00:10:00 goal for team blue game_id_0123 1.0
3 00:10:34 red card for player 12 game_id_0124 0.0
4 00:00:00 start of second half game_id_0124 1.0
5 00:10:00 goal for team red game_id_0124 1.0
6 00:12:00 yellow card for player 1 game_id_0124 1.0
add 45 minutes in secod half events
df["playtime"] = np.where(
df["mask"].eq(1),
(df["playtime"].astype("M8")
pd.Timedelta(minutes=45)).astype(str).str[11:],
df["playtime"])
Which returns
playtime event game mask
0 00:10:34 red card for player 10 game_id_0123 0.0
1 00:45:00 start of second half game_id_0123 1.0
2 00:55:00 goal for team blue game_id_0123 1.0
3 00:10:34 red card for player 12 game_id_0124 0.0
4 00:45:00 start of second half game_id_0124 1.0
5 00:55:00 goal for team red game_id_0124 1.0
6 00:57:00 yellow card for player 1 game_id_0124 1.0
CodePudding user response:
# Carrega os dados do jogo em um DataFrame do pandas
import pandas as pd
df = pd.read_csv('games.csv')
# Cria uma coluna com a duração do evento em segundos
df['duration_secs'] = df['duration'].str.split(':').apply(lambda x:
int(x[0]) * 60 int(x[1]))
# Cria uma coluna com a duração do evento após o início da segunda metade
df['duration_secs_2h'] = df['duration_secs'].where(df['half'] == 2,
df['duration_secs'] 45 * 60)
# Calcula a duração total do evento após o início da segunda metade para cada jogo
df_grouped = df.groupby('game_id')['duration_secs_2h'].sum()