Home > Software engineering >  compare columns with NaN or <NA> values pandas
compare columns with NaN or <NA> values pandas

Time:06-02

I have the dataframe with NaN and values, now I want to compare two columns in the same dataframe whether each row values in null or not null. For examples,

  1. if the column a_1 have null values, column a_2 have not null values, then for that particular row, the result should be 1 in the new column a_12.
  2. If the values in both a_1(value is 123) & a_2(value is 345) is not null, and the values are not equal, then the result should be 3 in column a_12.

below is the code snippet I have used for comparison, for the scenario 1, I am getting the result as 3 instead of 1. Please guide me to get the correct output.

    try:
        if (x[cols[0]]==x[cols[1]]) & (~np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):   
            return 0

        elif (np.isnan(x[cols[0]])) & (np.isnan(x[cols[1]])):
            return 0

        elif (~np.isnan(x[cols[0]])) & (np.isnan(x[cols[1]])):
            return 1

        elif (np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
            return 2

        elif (x[cols[0]]!=x[cols[1]]) & (~np.isnan(x[cols[0]])) & (~np.isnan(x[cols[1]])):
            return 3
        else:
            pass

    except Exception as exc:
        if (x[cols[0]]==x[cols[1]]) & (pd.notna(x[cols[0]])) & (pd.notna(x[cols[1]])):   
            return 0

        elif (pd.isna(x[cols[0]])) & (pd.isna(x[cols[1]])):
            return 0

        elif (pd.notna(x[cols[0]])) & (pd.isna(x[cols[1]])):
            return 1

        elif (pd.isna(x[cols[0]])) & (pd.notna(x[cols[1]])):
            return 2

        elif (x[cols[0]]!=x[cols[1]]) & (pd.notna(x[cols[0]])) & (pd.notna(x[cols[1]])):
            return 3
        else:
            pass

I have used pd.isna() and pd.notna(), also np.isnan() and ~np.isnan(), because for some columns the second method (np.isnan()) is working, for some columns its just throwing an error.

Please guide me to achieve the result as excepted.

Expected Output:

| a_1       | a_2     | result |
|-----------|---------|--------|
| gssfwe    | gssfwe  |   0    |
| <NA>      | <NA>    |   0    |
| fsfsfw    | <NA>    |   1    |
| <NA>      | qweweqw |   2    |
| adsadgsgd | wwuwquq |   3    |

Output Got with the above code:

| a_1       | a_2     | result |
|-----------|---------|--------|
| gssfwe    | gssfwe  |   0    |
| <NA>      | <NA>    |   0    |
| fsfsfw    | <NA>    |   3    |
| <NA>      | qweweqw |   3    |
| adsadgsgd | wwuwquq |   3    |

CodePudding user response:

Going by the logic in your code, you'd want to define a function and apply it across your DataFrame.

df = pd.DataFrame({'a_1': [1, 2, np.nan, np.nan, 1], 'a_2': [2, np.nan, 1, np.nan, 1]})

The categories you want map neatly to binary numbers, which you can use to write a short function like -

def nan_check(row):
    x, y = row
    if x != y:
        return int(f'{int(pd.notna(y))}{int(pd.notna(x))}', base=2)
    return 0

df['flag'] = df.apply(nan_check, axis=1)

Output

   a_1  a_2  flag
0  1.0  2.0     3
1  2.0  NaN     1
2  NaN  1.0     2
3  NaN  NaN     0
4  1.0  1.0     0

CodePudding user response:

You can try np.select, but I think you need to rethink the condition and the expected output

Condition 1: if the column a_1 have null values, column a_2 have not null values, then for that particular row, the result should be 1 in the new column a_12.

Condition 2: If the values in both a_1 & a_2 is not null, and the values are not equal, then the result should be 3 in column a_12.

df['a_12'] = np.select(
    [df['a_1'].isna() & df['a_2'].notna(),
     df['a_1'].notna() & df['a_2'].notna() & df['a_1'].ne(df['a_2'])],
    [1, 3],
    default=0
)
print(df)

         a_1      a_2  result  a_12
0     gssfwe   gssfwe       0     0
1        NaN      NaN       0     0
2     fsfsfw      NaN       1     0   # Shouldn't be Condition 1 since a_1 is not NaN
3        NaN  qweweqw       2     1   # Condition 1
4  adsadgsgd  wwuwquq       3     3
  • Related