Using data.table
in R
I can get group conditional sum of some column like this:
lirbary(data.table)
dt <- data.table(id=c("A", "A", "B"),
amount=c(100, 10, 1),
category=c("cat1", "cat2", "cat1"))
dt[, conditional_amount := sum(.SD[category == "cat1"]$amount), by=.(id)]
dt
result:
id amount category conditional_amount
1: A 100 cat1 100
2: A 10 cat2 100
3: B 1 cat1 1
What is the equivalent of this in pandas? When I do this:
import pandas as pd
df = pd.DataFrame({"id": ["A", "A", "B"],
"amount": [100, 10, 1],
"category": ["cat1", "cat2", "cat1"]})
g = df.groupby(["id"]).apply(lambda x: x.query("category == 'cat1'")["amount"].sum())
result:
id
A 100
B 1
dtype: int64
I then need to add this back to the original DataFrame:
g = g.reset_index()
g.columns = ["id", "conditional_amount"]
df = df.merge(g, on="id")
df
result:
id amount category conditional_amount
0 A 100 cat1 100
1 A 10 cat2 100
2 B 1 cat1 1
Is there a more succinct way of doing that? It feels like there must be, but because it's a conditional sum I cannot use .transform
here. Thanks!
CodePudding user response:
You can mask the Series before using groupby.transform
:
df['conditional_amount'] = (df['amount']
.where(df['category'].eq('cat1'))
.groupby(df['id'])
.transform('sum')
)
output:
id amount category conditional_amount
0 A 100 cat1 100.0
1 A 10 cat2 100.0
2 B 1 cat1 1.0