I have a multi column index dataframe. Some column headers might have pd.NA
values.
The actual values in the dataframe might be zero, one, or pd.NA
.
How can I transform all zeros and ones into bool
while preserving the pd.NA
values?
import pandas as pd
idx_l1 = ("a", "b")
idx_l2 = (pd.NA, pd.NA)
idx_l3 = ("c", "c")
df = pd.DataFrame(
data=[
[1, pd.NA, 0, pd.NA, 0, 1, pd.NA, pd.NA],
[pd.NA, 0, 1, pd.NA, pd.NA, pd.NA, 0, 0],
[0, 1, 1, 1, 0, pd.NA, pd.NA, 0],
],
columns=pd.MultiIndex.from_product([idx_l1, idx_l2, idx_l3]),
)
df = df.rename_axis(["level1", "level2", "level3"], axis=1)
print(df)
level1 a b
level2 NaN NaN
level3 c c c c c c c c
0 1 <NA> 0 <NA> 0 1 <NA> <NA>
1 <NA> 0 1 <NA> <NA> <NA> 0 0
2 0 1 1 1 0 <NA> <NA> 0
CodePudding user response:
You can use the .replace
method:
df = df.replace({1: True, 0: False})
print(df)
Output:
level1 a b
level2 NaN NaN
level3 c c c c c c c c
0 True <NA> False <NA> False True <NA> <NA>
1 <NA> False True <NA> <NA> <NA> False False
2 False True True True False <NA> <NA> False
CodePudding user response:
To transform the values in the dataframe to boolean while preserving pd.NA values, you can use the applymap method along with a custom function that checks the value of each element and returns True if it is equal to 1, False if it is equal to 0, and pd.NA otherwise.
def transform_to_bool(x):
if x == 1:
return True
elif x == 0:
return False
else:
return pd.NA
df = df.applymap(transform_to_bool)
or even with a lambda:
df = df.applymap(lambda x: bool(x) if x != pd.NA else pd.NA)
This will transform all the values in the dataframe to boolean, while preserving the pd.NA values.
CodePudding user response:
You can use the pd.DataFrame.where method to transform the values in the DataFrame while preserving the pd.NA values.
Here's an example of how you can do this:
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'A': [0, 1, pd.NA, 0, 1], 'B': [0, pd.NA, 1, 0, 1]})
# Use the where method to transform the values in the dataframe
df = df.where(df.isin([0, 1]), pd.NA)
print(df)
The df.isin([0, 1]) part of the expression returns a boolean DataFrame with True for the values that are either 0 or 1 and False for all other values. The df.where method then replaces the values in the original DataFrame with pd.NA for the False values and keeps the original values for the True values.
I hope this helps!