I want to replace the missing values of one row with column values of another row based on a condition. The real problem has many more columns with NA values. In this example, I want to fill na values for row 4 with values from row 0 for columns A and B, as the value 'e' maps to 'a' for column C.
df = pd.DataFrame({'A': [0, 1, np.nan, 3, np.nan],
'B': [5, 6, np.nan, 8, np.nan],
'C': ['a', 'b', 'c', 'd', 'e']})
df
Out[21]:
A B C
0 0.0 5.0 a
1 1.0 6.0 b
2 NaN NaN c
3 3.0 8.0 d
4 NaN NaN e
I have tried this:
df.loc[df.C == 'e', ['A', 'B']] = df.loc[df.C == 'a', ['A', 'B']]
Is it possible to use a nested np.where statement instead?
CodePudding user response:
Your code fails due to index alignement. As the indices are different (0 vs 4), NaN are assigned.
Use the underlying numpy array to bypass index alignement:
df.loc[df.C == 'e', ['A', 'B']] = df.loc[df.C == 'a', ['A', 'B']].values
NB. You must have the same size on both sides of the equal sign. Output:
A B C
0 0.0 5.0 a
1 1.0 6.0 b
2 NaN NaN c
3 3.0 8.0 d
4 0.0 5.0 e