Home > Software engineering >  Python Pandas countifs with unique values
Python Pandas countifs with unique values

Time:11-07

I’m having a dataset with entries where one column is an identifier, let’s say column A. I’d like to count how many entries in column A which is unique and where column B is between x and y and column C is equal with z.

To examplify:

Row Column A Column B Column C
1 1001 4 1
2 1001 3 0
3 1001 6 1
4 1001 4 1
5 1002 7 0
6 1002 7 1
7 1002 2 1
8 1002 3 1
9 1003 0 1
10 1003 3 0
11 1003 3 1
12 1003 4 1

What I want to achieve is following: Count how many unique values of column A which has exactly two entries in column B between 2-4 and where column C is equal to 1.

Looking at the table this would return 1 since only Column A=1002 meets all criteria (row 7 and 8).

I've tried some code but I don't know how to succeed with the unique value criteria in column A.

CodePudding user response:

This should work. First I subset on your conditions, then I count the the number of occurrences, check if it is 2, and then sum those.

sum(df[(df['Column B ']> 1) & (df['Column B ']<4) & (df['Column C'] == 1)]['Column A '].value_counts() == 2)

CodePudding user response:

first create a condition to filter your dataframe

con = df['Column B'].between(2,4) & df['Column C'].eq(1)

then use a groupby operation.

df.loc[con].groupby('Column A')['Column A'].nunique()

Column A
1001    1
1002    1
1003    1
Name: Column A, dtype: int64

df.loc[con]

    Row  Column A  Column B  Column C
0     1      1001         4         1
3     4      1001         4         1
6     7      1002         2         1
7     8      1002         3         1
10   11      1003         3         1
11   12      1003         4         1
  • Related