Home > Mobile >  pandas groupby apply result same size as the original dataframe
pandas groupby apply result same size as the original dataframe

Time:08-30

Using data.table in R I can get group conditional sum of some column like this:

lirbary(data.table)

dt <- data.table(id=c("A", "A", "B"),
                 amount=c(100, 10, 1),
                 category=c("cat1", "cat2", "cat1"))

dt[, conditional_amount := sum(.SD[category == "cat1"]$amount), by=.(id)]
dt

result:

   id amount category conditional_amount
1:  A    100     cat1                100
2:  A     10     cat2                100
3:  B      1     cat1                  1

What is the equivalent of this in pandas? When I do this:

import pandas as pd

df = pd.DataFrame({"id": ["A", "A", "B"],
                   "amount": [100, 10, 1],
                   "category": ["cat1", "cat2", "cat1"]})

g = df.groupby(["id"]).apply(lambda x: x.query("category == 'cat1'")["amount"].sum())

result:

id
A    100
B      1
dtype: int64

I then need to add this back to the original DataFrame:

g = g.reset_index()
g.columns = ["id", "conditional_amount"]
df = df.merge(g, on="id")
df

result:

  id  amount category  conditional_amount
0  A     100     cat1                 100
1  A      10     cat2                 100
2  B       1     cat1                   1

Is there a more succinct way of doing that? It feels like there must be, but because it's a conditional sum I cannot use .transform here. Thanks!

CodePudding user response:

You can mask the Series before using groupby.transform:

df['conditional_amount'] = (df['amount']
                            .where(df['category'].eq('cat1'))
                            .groupby(df['id'])
                            .transform('sum')
                           )

output:

  id  amount category  conditional_amount
0  A     100     cat1               100.0
1  A      10     cat2               100.0
2  B       1     cat1                 1.0
  • Related