Home > Back-end >  Count how many times any value appears in a combination of columns but not in other columns (pandas)
Count how many times any value appears in a combination of columns but not in other columns (pandas)

Time:02-12

I have a data set that looks like this:

A   B   C
1   2   
    3   4
1       5
1   2   4
1       2
1 

I want to return the amount of times any value appears in a combination of two columns but not the third column. The numeric values in this case are arbitrary. I just care about counting instances.

In other words:

1.) How many times does any value appear in column A and column B but not column C?

2.) How many times does any value appear in column B and column C but not column A?

3.) How many times does any value appear in column A and column C but not in column B?

My expected answers based on the mock data I have given above:

1.) 1 (row 1)
2.) 1 (row 2)
3.) 2 (rows 3 and 5)

CodePudding user response:

You could use isna to create a boolean DataFrame. Then filter the rows that have only one NaN value (so that we exclude the last row). Finally sum vertically:

df_na = df.isna()
df_na[df_na.sum(axis=1).eq(1)].sum()

Output:

A    1
B    2
C    1
dtype: int64

Then for example, column "A" doesn't have a value while the other two have values once, "B" doesn't have a value while the other two have twice, etc.

  • Related