Home > database >  Pandas Dataframe groupby()
Pandas Dataframe groupby()

Time:12-27

I have a dataset that looks similar to this:

Name Status Activity
Jane student yes
John businessman yes
Elle student no
Chris policeman yes
John businessman no
Clay businessman yes

I want to group the dataset by Status and Name which have Activity as a 'yes' and count the Name. If it at least has one 'yes', it will be counted.

Basically, this is the output that I want:

student 1 Jane

businessman 2 John, Clay

policeman 1 Chris

I've tried these codes:

cb = (DataFrame.groupby(['Name', 'Status']).sum(DataFrame['Activity'].eq('yes')))

cb = (DataFrame.groupby(['Name', 'Status']).any(DataFrame['Activity'].eq('yes')))

cb = (DataFrame.groupby(['Name', 'Status']).nunique(DataFrame['Activity'].eq('yes')))

but, all of them give this error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Please help me to fix this code. Thank you in advance!

CodePudding user response:

Example

data = {'Name': {0: 'Jane', 1: 'John', 2: 'Elle', 3: 'Chris', 4: 'John', 5: 'Clay'},
        'Status': {0: 'student', 1: 'businessman', 2: 'student', 3: 'policeman', 4: 'businessman', 5: 'businessman'},
        'Activity': {0: 'yes', 1: 'yes', 2: 'no', 3: 'yes', 4: 'no', 5: 'yes'}}
df = pd.DataFrame(data)

Code

out = (df[df['Activity'].eq('yes')]
       .groupby('Status', sort=False)['Name'].agg(['count', ', '.join]))

out

            count   join
Status      
student     1       Jane
businessman 2       John, Clay
policeman   1       Chris

CodePudding user response:

Check below:

dd = df.query("Activity != 'no'").\
groupby('Status').\
agg({'Name':[','.join,'count']}).reset_index()

dd.columns = ['Status','Names','count']

dd.head()

Output:

enter image description here

  • Related