My dataframe consists of 3 columns. The thirth column is based on the first two columns. The default column is column 2. But if column 2 is NaN, then I want column 3 to be filled with column 1. I added the third line to conditions, but it does not seem to work.
This is the DataFrame:
df = pd.DataFrame(np.array([[np.nan, 1717], [1749, 1750], [1704, np.nan]]),
columns=['a', 'b'])
This is my code:
import numpy as np
import pandas as pd
conditions = [
(df["b"] <= df["a"]),
df["b"] > df["a"],
df["b"] == df["b"].isna()]
choices = [df["b"], df["a"], df["a"]]
df['c'] = np.select(conditions, choices, default=df["b"])
print(df)
This is my output:
a b c
0 NaN 1749.0 1749.0
1 1717.0 1750.0 1717.0
2 1704.0 NaN NaN
But I want c to be filled if a or b is filled. So this is the output I want:
a b c
0 NaN 1749.0 1749.0
1 1717.0 1750.0 1717.0
2 1704.0 NaN 1704.0
CodePudding user response:
You just need to make a small change to your third condition. df["b"].isna()
already returns True
or False
, so df["b"] == df["b"].isna()
is actually checking to see if df["b"]
evaluates to the same boolean
(it doesn't).
Just remove the first half of the third condition.
import numpy as np
import pandas as pd
conditions = [
(df["b"] <= df["a"]),
df["b"] > df["a"],
df["b"].isna()]
choices = [df["b"], df["a"], df["a"]]
df['c'] = np.select(conditions, choices, default=df["b"])
print(df)
CodePudding user response:
This seems to work:
df = pd.DataFrame(np.array([[np.nan, 1717], [1749, 1750], [1704, np.nan]]),
columns=['a', 'b'])
df['c'] = df.a
for i in range(len(df)):
if df.a.iloc[i] == np.nan:
df.c.iloc[i] = df.b.iloc[i]