Home > Back-end >  Convert zeros and ones to bool while preserving pd.NA in a multi column index dataframe
Convert zeros and ones to bool while preserving pd.NA in a multi column index dataframe

Time:12-21

I have a multi column index dataframe. Some column headers might have pd.NA values.

The actual values in the dataframe might be zero, one, or pd.NA.

How can I transform all zeros and ones into bool while preserving the pd.NA values?

import pandas as pd

idx_l1 = ("a", "b")
idx_l2 = (pd.NA, pd.NA)
idx_l3 = ("c", "c")

df = pd.DataFrame(
    data=[
        [1, pd.NA, 0, pd.NA, 0, 1, pd.NA, pd.NA],
        [pd.NA, 0, 1, pd.NA, pd.NA, pd.NA, 0, 0],
        [0, 1, 1, 1, 0, pd.NA, pd.NA, 0],
    ],
    columns=pd.MultiIndex.from_product([idx_l1, idx_l2, idx_l3]),
)
df = df.rename_axis(["level1", "level2", "level3"], axis=1)

print(df)

level1     a                    b                  
level2   NaN                  NaN                  
level3     c     c  c     c     c     c     c     c
0          1  <NA>  0  <NA>     0     1  <NA>  <NA>
1       <NA>     0  1  <NA>  <NA>  <NA>     0     0
2          0     1  1     1     0  <NA>  <NA>     0

CodePudding user response:

You can use the .replace method:

df = df.replace({1: True, 0: False})

print(df)

Output:

level1      a                          b                    
level2    NaN                        NaN                    
level3      c      c      c     c      c     c      c      c
0        True   <NA>  False  <NA>  False  True   <NA>   <NA>
1        <NA>  False   True  <NA>   <NA>  <NA>  False  False
2       False   True   True  True  False  <NA>   <NA>  False

CodePudding user response:

To transform the values in the dataframe to boolean while preserving pd.NA values, you can use the applymap method along with a custom function that checks the value of each element and returns True if it is equal to 1, False if it is equal to 0, and pd.NA otherwise.

def transform_to_bool(x):
    if x == 1:
        return True
    elif x == 0:
        return False
    else:
        return pd.NA

df = df.applymap(transform_to_bool)

or even with a lambda:

df = df.applymap(lambda x: bool(x) if x != pd.NA else pd.NA)

This will transform all the values in the dataframe to boolean, while preserving the pd.NA values.

CodePudding user response:

You can use the pd.DataFrame.where method to transform the values in the DataFrame while preserving the pd.NA values.

Here's an example of how you can do this:

import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({'A': [0, 1, pd.NA, 0, 1], 'B': [0, pd.NA, 1, 0, 1]})

# Use the where method to transform the values in the dataframe
df = df.where(df.isin([0, 1]), pd.NA)

print(df)

The df.isin([0, 1]) part of the expression returns a boolean DataFrame with True for the values that are either 0 or 1 and False for all other values. The df.where method then replaces the values in the original DataFrame with pd.NA for the False values and keeps the original values for the True values.

I hope this helps!

  • Related