I have a dataset that looks similar to this:
Name | Status | Activity |
---|---|---|
Jane | student | yes |
John | businessman | yes |
Elle | student | no |
Chris | policeman | yes |
John | businessman | no |
Clay | businessman | yes |
I want to group the dataset by Status
and Name
which have Activity
as a 'yes' and count the Name
. If it at least has one 'yes', it will be counted.
Basically, this is the output that I want:
student 1 Jane
businessman 2 John, Clay
policeman 1 Chris
I've tried these codes:
cb = (DataFrame.groupby(['Name', 'Status']).sum(DataFrame['Activity'].eq('yes')))
cb = (DataFrame.groupby(['Name', 'Status']).any(DataFrame['Activity'].eq('yes')))
cb = (DataFrame.groupby(['Name', 'Status']).nunique(DataFrame['Activity'].eq('yes')))
but, all of them give this error:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please help me to fix this code. Thank you in advance!
CodePudding user response:
Example
data = {'Name': {0: 'Jane', 1: 'John', 2: 'Elle', 3: 'Chris', 4: 'John', 5: 'Clay'},
'Status': {0: 'student', 1: 'businessman', 2: 'student', 3: 'policeman', 4: 'businessman', 5: 'businessman'},
'Activity': {0: 'yes', 1: 'yes', 2: 'no', 3: 'yes', 4: 'no', 5: 'yes'}}
df = pd.DataFrame(data)
Code
out = (df[df['Activity'].eq('yes')]
.groupby('Status', sort=False)['Name'].agg(['count', ', '.join]))
out
count join
Status
student 1 Jane
businessman 2 John, Clay
policeman 1 Chris
CodePudding user response:
Check below:
dd = df.query("Activity != 'no'").\
groupby('Status').\
agg({'Name':[','.join,'count']}).reset_index()
dd.columns = ['Status','Names','count']
dd.head()
Output: