Home > Enterprise >  Adding to a duration before a certain condition in a pandas dataframe
Adding to a duration before a certain condition in a pandas dataframe

Time:12-08

I have ticker data about sports matches of the type:

    playtime    event  game
0    00:10:34    'red card for player 10'  game_id_0123
1    00:00:00    'start of second half' game_id_0123
2    00:10:00    'goal for team blue'    game_id_0123

What is the easiest way using pandas (python) to add, say 45 minutes to the playtime for each event after the start of the second half for every game? I am considering using the index and doing a (group by) condition for each game id. But that sounds overly complex, and I fear I am missing a feature in pandas that would allow me to do that more easily or in a standard matter.

CodePudding user response:

Example

data = {'playtime': {0: '00:10:34',
                     1: '00:00:00',
                     2: '00:10:00',
                     3: '00:40:00',
                     4: '00:00:00',
                     5: '00:05:00'},
        'event': {0: 'red card for player 10',
                  1: 'start of second half',
                  2: 'goal for team blue',
                  3: 'goal for team red',
                  4: 'start of second half',
                  5: 'assist for team red'},
        'game': {0: 'game_id_0123',
                 1: 'game_id_0123',
                 2: 'game_id_0123',
                 3: 'game_id_0124',
                 4: 'game_id_0124',
                 5: 'game_id_0124'}}

df = pd.DataFrame(data)

df

    playtime    event                   game
0   00:10:34    red card for player 10  game_id_0123
1   00:00:00    start of second half    game_id_0123
2   00:10:00    goal for team blue      game_id_0123
3   00:40:00    goal for team red       game_id_0124
4   00:00:00    start of second half    game_id_0124
5   00:05:00    assist for team red     game_id_0124

Code

make series with 0(first half) and 1(second half)

event = 'start of second half'
df['event'].eq(event).groupby(df['game']).cumsum()

output:

0    0
1    1
2    1
3    0
4    1
5    1
Name: event, dtype: int64

make condition(cond1) from output, add 45min if 1

cond1 = df['event'].eq(event).groupby(df['game']).cumsum().eq(1)
df['playtime'] = pd.to_datetime(df['playtime'])
df['playtime'] = df['playtime'].mask(cond1, df['playtime']   pd.Timedelta('45m')).dt.time

df

    playtime    event                   game
0   00:10:34    red card for player 10  game_id_0123
1   00:45:00    start of second half    game_id_0123
2   00:55:00    goal for team blue      game_id_0123
3   00:40:00    goal for team red       game_id_0124
4   00:45:00    start of second half    game_id_0124
5   00:50:00    assist for team red     game_id_0124

CodePudding user response:

Not sure if this can be any easier:

Data

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {"playtime": [
        "00:10:34",
        "00:00:00",
        "00:10:00",
        "00:10:34",
        "00:00:00",
        "00:10:00",
        "00:12:00"],
     "event": [
         'red card for player 10',
         'start of second half',
         'goal for team blue',
         'red card for player 12',
         'start of second half',
         'goal for team red',
         'yellow card for player 1'],
     "game": [
         "game_id_0123",
         "game_id_0123",
         "game_id_0123",
         "game_id_0124",
         "game_id_0124",
         "game_id_0124",
         "game_id_0124"]})

Mask when you need to add 45 minutes

df["mask"] = np.where(
    df["event"].eq('start of second half'),
    1,
    np.nan)

first_event_per_game = df.groupby("game")\
    .apply(lambda x: x.index[0]).values

df["mask"] = np.where(
    df.index.isin(first_event_per_game),
    0,
    df["mask"])

df["mask"] = df["mask"].ffill()

which now looks like

   playtime                     event          game  mask
0  00:10:34    red card for player 10  game_id_0123   0.0
1  00:00:00      start of second half  game_id_0123   1.0
2  00:10:00        goal for team blue  game_id_0123   1.0
3  00:10:34    red card for player 12  game_id_0124   0.0
4  00:00:00      start of second half  game_id_0124   1.0
5  00:10:00         goal for team red  game_id_0124   1.0
6  00:12:00  yellow card for player 1  game_id_0124   1.0

add 45 minutes in secod half events

df["playtime"] = np.where(
    df["mask"].eq(1),
    (df["playtime"].astype("M8")   
     pd.Timedelta(minutes=45)).astype(str).str[11:],
    df["playtime"])

Which returns

   playtime                     event          game  mask
0  00:10:34    red card for player 10  game_id_0123   0.0
1  00:45:00      start of second half  game_id_0123   1.0
2  00:55:00        goal for team blue  game_id_0123   1.0
3  00:10:34    red card for player 12  game_id_0124   0.0
4  00:45:00      start of second half  game_id_0124   1.0
5  00:55:00         goal for team red  game_id_0124   1.0
6  00:57:00  yellow card for player 1  game_id_0124   1.0

CodePudding user response:

# Carrega os dados do jogo em um DataFrame do pandas
import pandas as pd
df = pd.read_csv('games.csv')

# Cria uma coluna com a duração do evento em segundos
df['duration_secs'] = df['duration'].str.split(':').apply(lambda x: 
int(x[0]) * 60   int(x[1]))

# Cria uma coluna com a duração do evento após o início da segunda metade
df['duration_secs_2h'] = df['duration_secs'].where(df['half'] == 2, 
df['duration_secs']   45 * 60)

# Calcula a duração total do evento após o início da segunda metade para cada jogo
df_grouped = df.groupby('game_id')['duration_secs_2h'].sum()
  • Related