Home > Software engineering >  How to choose row from other column in dataframe if row from the default column is NaN?
How to choose row from other column in dataframe if row from the default column is NaN?

Time:04-04

My dataframe consists of 3 columns. The thirth column is based on the first two columns. The default column is column 2. But if column 2 is NaN, then I want column 3 to be filled with column 1. I added the third line to conditions, but it does not seem to work.

This is the DataFrame:

df = pd.DataFrame(np.array([[np.nan, 1717], [1749, 1750], [1704, np.nan]]),
                   columns=['a', 'b'])

This is my code:

import numpy as np
import pandas as pd
conditions = [
    (df["b"] <= df["a"]), 
    df["b"] > df["a"],
    df["b"] == df["b"].isna()]

choices = [df["b"], df["a"], df["a"]]

df['c'] = np.select(conditions, choices, default=df["b"])
print(df)

This is my output:

           a            b      c
0        NaN         1749.0  1749.0
1        1717.0      1750.0  1717.0
2        1704.0      NaN     NaN

But I want c to be filled if a or b is filled. So this is the output I want:

           a            b      c
0        NaN         1749.0  1749.0
1        1717.0      1750.0  1717.0
2        1704.0      NaN     1704.0

CodePudding user response:

You just need to make a small change to your third condition. df["b"].isna() already returns True or False, so df["b"] == df["b"].isna() is actually checking to see if df["b"] evaluates to the same boolean (it doesn't).

Just remove the first half of the third condition.

import numpy as np
import pandas as pd
conditions = [
    (df["b"] <= df["a"]), 
    df["b"] > df["a"],
    df["b"].isna()]

choices = [df["b"], df["a"], df["a"]]

df['c'] = np.select(conditions, choices, default=df["b"])
print(df)

CodePudding user response:

This seems to work:

df = pd.DataFrame(np.array([[np.nan, 1717], [1749, 1750], [1704, np.nan]]),
               columns=['a', 'b'])

df['c'] = df.a

for i in range(len(df)):
    if df.a.iloc[i] == np.nan:
        df.c.iloc[i] = df.b.iloc[i]
  • Related