Counting number of values in a column using groupby on a specific conditon in pandas-CodePudding

I have a dataframe which looks something like this:

dfA
name    field    country    action
Sam     elec     USA        POS
Sam     elec     USA        POS
Sam     elec     USA        NEG
Tommy   mech     Canada     NEG
Tommy   mech     Canada     NEG
Brian   IT       Spain      NEG
Brian   IT       Spain      NEG
Brian   IT       Spain      POS

I want to group the dataframe based on the first 3 columns adding a new column "No of data". This is something which I do using this:

dfB = dfA.groupby(["name", "field", "country"], dropna=False).size().reset_index(name = "No_of_data")

This gives me a new dataframe which looks something like this:

dfB
name    field    country   No_of_data
Sam     elec     USA           3
Tommy   mech     Canada        2
Brian   IT       Spain         3

But now I also want to add a new column to this particular dataframe which tells me what is the count of number of "POS" for every combination of "name", "field" and "country". Which should look something like this:

dfB
name    field    country   No_of_data   No_of_POS
Sam     elec     USA           3            2
Tommy   mech     Canada        2            0
Brian   IT       Spain         3            1

How do I add the new column (No_of_POS) to the table dfB when I dont have the information about "POS NEG" in it and needs to be taken from dfA.

CodePudding user response：

You can use a dictionary with functions in the aggregate method:

dfA.groupby(["name", "field", "country"], as_index=False)['action']\
.agg({'No_of_data': 'size', 'No_of_POS': lambda x: x.eq('POS').sum()})

CodePudding user response：

You can precompute the boolean before aggregating; performance should be better as the data size increases :

(df.assign(action = df.action.eq('POS'))
   .groupby(['name', 'field', 'country'], 
            sort = False, 
            as_index = False)
   .agg(no_of_data = ('action', 'size'), 
        no_of_pos = ('action', 'sum'))

    name field country  no_of_data  no_of_pos
0    Sam  elec     USA           3          2
1  Tommy  mech  Canada           2          0
2  Brian    IT   Spain           3          1

CodePudding user response：

You can add an aggregation function when you're grouping your data. Check agg() function, maybe this will help.