I have a dataset where I would like to:
- group by location and box and take a count of the box
Data
ID location type box status
aa NY no box55 hey
aa NY no box55 hi
aa NY yes box66 hello
aa NY yes box66 goodbye
aa CA no box11 hey
aa CA no box11 hi
aa CA yes box11 hello
aa CA yes box11 goodbye
aa CA no box86 hey
aa CA no box86 hi
aa CA yes box86 hello
aa CA yes box99 goodbye
aa CA no box99 hey
aa CA no box99 hi
Desired
location box count box
NY 2 box55
NY 2 box66
CA 3 box11
CA 3 box86
CA 3 box99
Doing
df['box count'] = df.groupby(['location','box'])['box'].size()
Any suggestion is appreciated.
CodePudding user response:
Try:
df = df.groupby(["location", "box"], as_index=False).agg(
**{"box count": ("box", "size")}
)
print(df)
Prints:
location box box count
0 CA box11 4
1 CA box86 3
2 CA box99 3
3 NY box55 2
4 NY box66 2
EDIT:
m = df.groupby(["location"])["box"].nunique()
df = df.groupby(["location", "box"], as_index=False).agg(
**{
"box count": (
"location",
lambda x: m[x.iat[0]],
)
}
)
print(df)
Prints:
location box box count
0 CA box11 3
1 CA box86 3
2 CA box99 3
3 NY box55 2
4 NY box66 2