Home > Enterprise >  Groupby followed by a condition sum resulting in a new column in Pandas
Groupby followed by a condition sum resulting in a new column in Pandas

Time:09-02

I have a dataframe like so:

df = pd.DataFrame(
    {
        "day": ["Mon", "Mon", "Mon", "Mon", "Tues", "Tues", "Tues", "Tues"],
        "name": ["James", "Rover", "Cleo", "X", "Bran", "Excaliber", "Henry", "Mia"],
        "species": ['dog', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat'],
        "fleas": [0, 1, 2, 4, 6, 7, 8, 3],
    }
)
>>> df
    day       name species  fleas
0   Mon      James     dog      0
1   Mon      Rover     dog      1
2   Mon       Cleo     cat      2
3   Mon          X     cat      4
4  Tues       Bran     dog      6
5  Tues  Excaliber     dog      7
6  Tues      Henry     cat      8
7  Tues        Mia     cat      3

Each row corresponds to a measurement on a particular day of how many fleas were on a particular animal. What I want to do is groupby the day and then sum all the fleas that were on the dogs, and create a new column dog_fleas with the result. The result of this operation should look like

    day       name species  fleas  dog_fleas
0   Mon      James     dog      0          1
1   Mon      Rover     dog      1          1
2   Mon       Cleo     cat      2          1
3   Mon          X     cat      4          1
4  Tues       Bran     dog      6         13
5  Tues  Excaliber     dog      7         13
6  Tues      Henry     cat      8         13
7  Tues        Mia     cat      3         13

How do I do this in Pandas?

CodePudding user response:

Let us try transform with where

df['new'] = df['fleas'].where(df['species']=='dog').groupby(df['day']).transform('sum')
df
Out[88]: 
    day       name species  fleas   new
0   Mon      James     dog      0   1.0
1   Mon      Rover     dog      1   1.0
2   Mon       Cleo     cat      2   1.0
3   Mon          X     cat      4   1.0
4  Tues       Bran     dog      6  13.0
5  Tues  Excaliber     dog      7  13.0
6  Tues      Henry     cat      8  13.0
7  Tues        Mia     cat      3  13.0

CodePudding user response:

Groupby day column, and apply a function to count dogs, convert to a dataframe, and merge back on day column.

>>> df.merge(
    df
    .groupby('day')
    .apply(lambda x: x.loc[x['species'].eq('dog'), 'fleas'].sum())
    .to_frame('dog_fleas'),
    on='day')

    day       name species  fleas  dog_fleas
0   Mon      James     dog      0          1
1   Mon      Rover     dog      1          1
2   Mon       Cleo     cat      2          1
3   Mon          X     cat      4          1
4  Tues       Bran     dog      6         13
5  Tues  Excaliber     dog      7         13
6  Tues      Henry     cat      8         13
7  Tues        Mia     cat      3         13

CodePudding user response:

Grouping by day, and calculating the sum of fleas on the dogs for that day:

def f(df):
    return pd.Series(df.loc[df['species'].eq('dog'), 'fleas'].sum(), 
                     index=df.index)

df['dog_fleas'] = df.groupby('day', group_keys=False).apply(f)
    day       name species  fleas  dog_fleas
0   Mon      James     dog      0          1
1   Mon      Rover     dog      1          1
2   Mon       Cleo     cat      2          1
3   Mon          X     cat      4          1
4  Tues       Bran     dog      6         13
5  Tues  Excaliber     dog      7         13
6  Tues      Henry     cat      8         13
7  Tues        Mia     cat      3         13
  • Related