I have a data set that looks like this:
A B C
1 2
3 4
1 5
1 2 4
1 2
1
I want to return the amount of times any value appears in a combination of two columns but not the third column. The numeric values in this case are arbitrary. I just care about counting instances.
In other words:
1.) How many times does any value appear in column A and column B but not column C?
2.) How many times does any value appear in column B and column C but not column A?
3.) How many times does any value appear in column A and column C but not in column B?
My expected answers based on the mock data I have given above:
1.) 1 (row 1) 2.) 1 (row 2) 3.) 2 (rows 3 and 5)
CodePudding user response:
You could use isna
to create a boolean DataFrame. Then filter the rows that have only one NaN value (so that we exclude the last row). Finally sum
vertically:
df_na = df.isna()
df_na[df_na.sum(axis=1).eq(1)].sum()
Output:
A 1
B 2
C 1
dtype: int64
Then for example, column "A" doesn't have a value while the other two have values once, "B" doesn't have a value while the other two have twice, etc.