Consider I have a data frame :
>>> data
c0 c1 c2 _c1 _c2
0 0 1 2 18.0 19.0
1 3 4 5 NaN NaN
2 6 7 8 20.0 21.0
3 9 10 11 NaN NaN
4 12 13 14 NaN NaN
5 15 16 17 NaN NaN
I want to update the values in the c1 and c2 columns with the values in the _c1 and _c2 columns whenever those latter values are not NaN
. Why won't the following work, and what is the correct way to do this?
>>> data.loc[~(data._c1.isna()),['c1','c2']]=data.loc[~(data._c1.isna()),['_c1','_c2']]
>>> data
c0 c1 c2 _c1 _c2
0 0 NaN NaN 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 NaN NaN 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
For completeness's sake I want the result to look like
>>> data.loc[~(data._c1.isna()),['c1','c2']]=data.loc[~(data._c1.isna()),['_c1','_c2']]
>>> data
c0 c1 c2 _c1 _c2
0 0 18.0 19.0. 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
CodePudding user response:
I recommend update
after rename
df.update(df[['_c1','_c2']].rename(columns={'_c1':'c1','_c2':'c2'}))
df
Out[266]:
c0 c1 c2 _c1 _c2
0 0 18.0 19.0 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
CodePudding user response:
You can use np.where
:
df[['c1', 'c2']] = np.where(df[['_c1', '_c2']].notna(),
df[['_c1', '_c2']],
df[['c1', 'c2']])
print(df)
# Output:
c0 c1 c2 _c1 _c2
0 0 18.0 19.0 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN