I want to groupby and see if all members in the group meet a certain condition. Here's a dummy example:
x = ['Mike','Mike','Mike','Bob','Bob','Phil']
y = ['Attended','Attended','Attended','Attended','Not attend','Not attend']
df = pd.DataFrame({'name':x,'attendance':y})
And what I want to do is return a 3x2 dataframe that shows for each name, who was always in attendance. It should look like below:
new_df = pd.DataFrame({'name':['Mike','Bob','Phil'],'all_attended':[True,False,False]})
Whats the best way to do this?
Thanks so much.
CodePudding user response:
Let's try
out = (df['attendance'].eq('Attended')
.groupby(df['name']).all()
.to_frame('all_attended').reset_index())
print(out)
name all_attended
0 Bob False
1 Mike True
2 Phil False
CodePudding user response:
one way could be:
df.groupby('name')['attendance'].apply(lambda x: True if x.unique().all()=='Attended' else False)
name
Bob False
Mike True
Phil False
Name: attendance, dtype: bool
CodePudding user response:
I would say away from strings for data that does not need to be a string:
z = [s == 'Attended' for s in y]
df = pd.DataFrame({'name': x, 'attended': z})
Now you can check if all the elements for a given group are True:
>>> df.groupby('name')['attendance'].all()
name
Bob False
Mike True
Phil False
Name: attendance, dtype: bool
If something can only be a 0 or 1, using a string introduces the possibility of errors because someone might type Atended
instead of Attended
, for example.