Home > Back-end >  Fill in NA column values with values from another row based on condition
Fill in NA column values with values from another row based on condition

Time:06-15

I want to replace the missing values of one row with column values of another row based on a condition. The real problem has many more columns with NA values. In this example, I want to fill na values for row 4 with values from row 0 for columns A and B, as the value 'e' maps to 'a' for column C.

df = pd.DataFrame({'A': [0, 1, np.nan, 3, np.nan],
                'B': [5, 6, np.nan, 8, np.nan],
                'C': ['a', 'b', 'c', 'd', 'e']})

df
Out[21]: 
      A    B  C
 0  0.0  5.0  a
 1  1.0  6.0  b
 2  NaN  NaN  c
 3  3.0  8.0  d
 4  NaN  NaN  e

I have tried this:

df.loc[df.C == 'e', ['A', 'B']] = df.loc[df.C == 'a', ['A', 'B']]

Is it possible to use a nested np.where statement instead?

CodePudding user response:

Your code fails due to index alignement. As the indices are different (0 vs 4), NaN are assigned.

Use the underlying numpy array to bypass index alignement:

df.loc[df.C == 'e', ['A', 'B']] = df.loc[df.C == 'a', ['A', 'B']].values

NB. You must have the same size on both sides of the equal sign. Output:

     A    B  C
0  0.0  5.0  a
1  1.0  6.0  b
2  NaN  NaN  c
3  3.0  8.0  d
4  0.0  5.0  e
  • Related