Home > Enterprise >  Analysis on dataframe with python
Analysis on dataframe with python

Time:12-30

I want to be able to calculate the average 'goal','shot',and 'miss' per shooterName to use for further analysis and visualization

The code below gives me the count of the 3 attributes(shot,goal,miss) in the 'event' column sorted by 'shooterName'

Dataframe columns:

season  period  time    teamCode    event   goal    xCord   yCord   xCordAdjusted   yCordAdjusted   ... playerPositionThatDidEvent  timeSinceFaceoff    playerNumThatDidEvent   shooterPlayerId shooterName shooterLeftRight    shooterTimeOnIce    shooterTimeOnIceSinceFaceoff    shotDistance

Corresponding data

2020    1   16  PHI SHOT    0   -74 29  74  -29 ... C   16  11   8478439.0  Travis Konecny  R   16  16  32.649655
2020    1   34  PIT SHOT    0   49  -25 49  -25 ... C   34  9   8478542.0   Evan Rodrigues  R   34  34  47.169906
2020    1   65  PHI SHOT    0   -52 -31 52  31  ... L   65  86  8480797.0   Joel Farabee    L   31  31  48.270074
2020    1   171 PIT SHOT    0   43  39  43  39  ... C   42  9   8478542.0   Evan Rodrigues  R   42  42  60.307545   
2020    1   209 PHI MISS    0   -46 33  46  -33 ... D   38  5   8479026.0   Philippe Myers  R   38  38  54.203321

Current code:

dft['count'] = df.groupby(['shooterName', 'event'])['event'].agg(['count'])
dft

Current Output:

shooterName event count
A.J. Greer  GOAL    1
            MISS    6
            SHOT    29
Aaron Downey    GOAL    1
                MISS    4
                SHOT    35

Zenon Konopka   GOAL    8
                MISS    57
                SHOT    176

              

Desired Output:

shooterName event count %totalshooterNameevents
A.J. Greer  GOAL    1   .0277
            MISS    6   .1666
            SHOT    29  .805

Aaron Downey    GOAL    1 .025
                MISS    4 .1
                SHOT    35 .875

Zenon Konopka   GOAL    8 .0331
                MISS    57 .236
                SHOT    176 .7302

Something similar to this. My end goal is to be able to calculate each 'event' attribute as a percentage of the total 'event' by 'shooterName'. Below I added a column '%totalshooterNameevents' which is 'simply goal', 'shot', and 'miss' calculated by the sum of the 'goal, shot, and miss' per each 'shooterName'

CodePudding user response:

Update

Try:

dft = df.groupby(['shooterName', 'event'])['event'].agg(['count']).reset_index()
dft['%total'] = dft.groupby('shooterName')['count'].apply(lambda x: x / sum(x))
print(dft)

# Output
     shooterName event  count    %total
0     A.J. Greer  GOAL      1  0.027778
1     A.J. Greer  MISS      6  0.166667
2     A.J. Greer  SHOT     29  0.805556
3   Aaron Downey  GOAL      1  0.025000
4   Aaron Downey  MISS      4  0.100000
5   Aaron Downey  SHOT     35  0.875000
6  Zenon Konopka  GOAL      8  0.033195
7  Zenon Konopka  MISS     57  0.236515
8  Zenon Konopka  SHOT    176  0.730290

Without sample, it's difficult to guess what you want. Try:

import pandas as pd
import numpy as np

# Setup a Minimal Reproducible Example
np.random.seed(2021)
df = pd.DataFrame({'shooterName': np.random.choice(list('AB'), 20),
                   'event': np.random.choice(['shot', 'goal', 'miss'], 20)})

# Create an empty dataframe?
dft = pd.DataFrame(index=df['shooterName'].unique())

# Do stuff
grp = df.groupby('shooterName')
dft['count'] = grp.count()
dft = dft.join(grp['event'].value_counts().unstack('event')
                           .div(dft['count'], axis=0))

Output:

>>> dft
   count      goal   miss      shot
A     12  0.416667  0.250  0.333333
B      8  0.500000  0.375  0.125000
  • Related