My DataFrame has the following form.
id group color
i1 aa white
i1 aa white
i1 ab white
i1 ab black
...
I apply a groupby as following:
groupdf = df.groupby(['id', 'group'])['color'].value_counts()
The result of the groupby has an multiindex.
value
id group color
i1 aa white 2
i1 ab white 1
i1 ab black 3
i1 ac black 5
i1 ad white 4
i1 ad black 5
i2 aa white 1
i2 aa black 1
i2 bb black 1
i2 cc white 2
i2 cc black 6
i2 ad black 5
My goal is to
- select entries where both categories of the last index color are present and then
- select the group with max value of the black color So the result would look like:
value
id
i1 5 #only groups ab and ad have both colors; ad.black = 5 > ab.black = 3
i2 6 #only groups aa and cc have both colors; cc.black = 6 > aa.black = 1
I have tried .xs() as well as .index.get_level_values() but I am not able to achieve my goal.
EDIT 1: I see that I have provided poor information how I got the DataFrame and updated it above. I can't directly plugin .max() as the original df does not have value column.
CodePudding user response:
Let's try:
# mask the groups with more than one colors
s = df.groupby(['id','group'])['value'].transform('size') > 1
# boolean index the groups and query, then another groupby with max
df[s].query('color=="black"').groupby(['id','color'])['value'].max()
Output:
id color
i1 black 5
i2 black 6
Name: value, dtype: int64