Home > Blockchain >  Pandas aggregating groupby object using multiple conditions in other columns
Pandas aggregating groupby object using multiple conditions in other columns

Time:10-01

In my project, I have a dataframe that goes a little something like this:

df = pd.DataFrame([
        {'posteam': 'NYJ', 'defteam': 'BAL', 'penalty': 1, 'penalty_team': 'NYJ', 'penalty_yards': 10}, 
        {'posteam': 'NYJ', 'defteam': 'BAL', 'penalty': 1, 'penalty_team': 'BAL', 'penalty_yards': 5},
        {'posteam': 'BAL', 'defteam': 'NYJ', 'penalty': 0, 'penalty_team': None, 'penalty_yards': 0},
        {'posteam': 'BAL', 'defteam': 'NYJ', 'penalty': 1, 'penalty_team': 'NYJ', 'penalty_yards': 15}])

I would like to aggregate by posteam value, but also including penalty and penalty_yards values where the penalty_team value equals the posteam value. The output would be something like

posteam penalty penalty_yards
NYJ 2 25
BAL 1 5

While the penalty would not have been a penalty by the possessing team, I would like to track total penalties and penalty yards by the team being aggregated.

Currently, my code is

df.groupby(['posteam']).agg({
    'penalty': 'sum',
    'penalty_yards': 'sum'
})

Which returns the following (where penalties are tracked solely by the possessing team)

posteam penalty penalty_yards
NYJ 2 15
BAL 1 15

(Keep in mind the difference is NYJ is getting one of the BAL penalties and BAL is getting one of NYJ's)

Is there an easy way to rope in a value given this type of situation? I tried making a helper column to track posteam's penalties, but that excluded values where the team got a penalty while playing as defteam.

CodePudding user response:

that should fix it

df1 = df.groupby(['posteam']).agg({
'penalty': 'sum'
})
df2 = df.groupby(['penalty_team']).agg({
'penalty_yards': 'sum'
})
pd.merge(df1, df2, left_index=True, right_index=True)
  • Related