Suppose I have a dataframe(grouped by ID-A and ID-B) like this-
ID-A ID-B time sum flag
A 1 09:30:00 5 Y
09:33:00 8 N
09:36:00 5 Y
2 09:36:00 10 Y
09:39:00 15 Y
09:42:00 2 Y
B 1 09:30:00 10 Y
09:33:00 12 N
09:36:00 5 Y
I want to calculate the total 'Y' flags against each ID-A and ID-B.
Desired Dataframe-
ID-A ID-B time sum flag(Total)
A 1 09:30:00 5 2
09:33:00 8
09:36:00 5
2 09:36:00 10 3
09:39:00 15
09:42:00 2
B 1 09:30:00 10 2
09:33:00 12
09:36:00 5
CodePudding user response:
You can groupby
the two index levels, count the Y
s per group and assign to the original df
.
Prepare data:
import pandas as pd
data = {'time': ['09:30:00',
'09:33:00',
'09:36:00',
'09:36:00',
'09:39:00',
'09:42:00',
'09:30:00',
'09:33:00',
'09:36:00'],
'sum': [5, 8, 5, 10, 15, 2, 10, 12, 5],
'flag': ['Y', 'N', 'Y', 'Y', 'Y', 'Y', 'Y', 'N', 'Y']}
my_index = pd.MultiIndex.from_arrays([["A"]*6 ["B"]*3, [1, 1, 1, 2, 2, 2, 1, 1, 1]], names=["ID-A", "ID-B"])
df = pd.DataFrame(data, index=my_index)
Result:
time sum flag
ID-A ID-B
A 1 09:30:00 5 Y
1 09:33:00 8 N
1 09:36:00 5 Y
2 09:36:00 10 Y
2 09:39:00 15 Y
2 09:42:00 2 Y
B 1 09:30:00 10 Y
1 09:33:00 12 N
1 09:36:00 5 Y
The rest:
>>> df.assign(flag_total=df.flag.eq("Y").groupby(level=[0, 1]).sum())
time sum flag flag_total
ID-A ID-B
A 1 09:30:00 5 Y 2
1 09:33:00 8 N 2
1 09:36:00 5 Y 2
2 09:36:00 10 Y 3
2 09:39:00 15 Y 3
2 09:42:00 2 Y 3
B 1 09:30:00 10 Y 2
1 09:33:00 12 N 2
1 09:36:00 5 Y 2