Unable to get the count of the data based on other columns in pandas-CodePudding

Below is the dataframe

AGMTNO  Employee_id POS Cust_Pincode    Pincode VaR RESOLVED_COUNT  P_Dealer_ArielDistance
0   MP3012TW0087178 5035496 606.0   487551  487551  29.070673   1   0.000000
1   MP3012TW0087282 5035496 525.0   470661  470661  30.075669   0   23.100406
2   TN3005TW0123515 5035496 874.0   639207  624620  34.000748   1   21.645896

I'm trying to write a function or code where it'll groupby on Employee_id, get the count of AGMTNO and count of AGMTNO where Cust_Pincode and Pincode is not matching

The expected output is below

Below is the exptected output

    FCE_Employee_id No_Of_Customers No_Of_Customers_not_matching
0   5035496         3                1

The code which i've tried is below

df.groupby('FCE_Employee_id').agg({'AGMTNO': 'count',
                                   'Cust_Pincode': lambda x: (x != x.shift()).sum()})

But the above code is not working.

CodePudding user response：

Simply assign first a column to check for matches, then groupby.agg:

(df
 .assign(match=lambda d: d['Cust_Pincode'].ne(d['Pincode']))
 .groupby('Employee_id', as_index=False)
 .agg(**{'No_Of_Customers': ('AGMTNO', 'count'),
         'No_Of_Customers_not_matching': ('match', 'sum')
        })
)

output:

   Employee_id  No_Of_Customers  No_Of_Customers_not_matching
0      5035496                3                             1

CodePudding user response：

I got the answer

df['Cust_Pincode'] = df['Cust_Pincode'].astype(str)
df['Pincode'] = df['Pincode'].astype(str)

df['Cust_Pincode'] = df['Cust_Pincode'].apply(lambda x: x.zfill(6))
df['Pincode'] = df['Pincode'].apply(lambda x: x.zfill(6))

df['Cust_Pincode_Pincode_match'] = np.where(df['Cust_Pincode'] == df['Pincode'], 1, 0)

df_grp = df.groupby('FCE_Employee_id').agg({'AGMTNO': 'count', 'Cust_Pincode_Pincode_match': 'sum'})
df_grp['No_Of_Customers_not_matching'] = df_grp['AGMTNO'] - df_grp['Cust_Pincode_Pincode_match']
df_grp = df_grp.drop(columns=['Cust_PincodePincode_match'])
df_grp = df_grp.reset_index()