I’m having a dataset with entries where one column is an identifier, let’s say column A. I’d like to count how many entries in column A which is unique and where column B is between x and y and column C is equal with z.
To examplify:
Row | Column A | Column B | Column C |
---|---|---|---|
1 | 1001 | 4 | 1 |
2 | 1001 | 3 | 0 |
3 | 1001 | 6 | 1 |
4 | 1001 | 4 | 1 |
5 | 1002 | 7 | 0 |
6 | 1002 | 7 | 1 |
7 | 1002 | 2 | 1 |
8 | 1002 | 3 | 1 |
9 | 1003 | 0 | 1 |
10 | 1003 | 3 | 0 |
11 | 1003 | 3 | 1 |
12 | 1003 | 4 | 1 |
What I want to achieve is following: Count how many unique values of column A which has exactly two entries in column B between 2-4 and where column C is equal to 1.
Looking at the table this would return 1 since only Column A=1002 meets all criteria (row 7 and 8).
I've tried some code but I don't know how to succeed with the unique value criteria in column A.
CodePudding user response:
This should work. First I subset on your conditions, then I count the the number of occurrences, check if it is 2, and then sum those.
sum(df[(df['Column B ']> 1) & (df['Column B ']<4) & (df['Column C'] == 1)]['Column A '].value_counts() == 2)
CodePudding user response:
first create a condition to filter your dataframe
con = df['Column B'].between(2,4) & df['Column C'].eq(1)
then use a groupby
operation.
df.loc[con].groupby('Column A')['Column A'].nunique()
Column A
1001 1
1002 1
1003 1
Name: Column A, dtype: int64
df.loc[con]
Row Column A Column B Column C
0 1 1001 4 1
3 4 1001 4 1
6 7 1002 2 1
7 8 1002 3 1
10 11 1003 3 1
11 12 1003 4 1