I have an original dataframe df0
with a number of values, based on this dataframe I have a second dateframe where some the original values are NaN
, df1
.
import pandas as pd
df0 = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]})
df1 = pd.DataFrame({'col1': [1,2,None,4,5,6,None,8,None,10,11,None,13,None,None]})
I need a df2
to be the inverse of df1
in terms of rows with NaN
. Every row which is NaN
in df1 should get its original value back from df0
and rows which are not NaN
must become NaN
such that I get the following dataframe:
df2 = pd.DataFrame({'col1': [None,None,3,None,None,None,7,None,9,None,None,12,None,14,15]})
What is the best way to go about this if it were a larger dataset?
CodePudding user response:
masking all columns
If you need to mask all columns, use mask
notna
OR where
isna
:
df2 = df0.mask(df1['col1'].notna())
# or
df2 = df0.where(df1['col1'].isna())
output:
col1
0 NaN
1 NaN
2 3.0
3 NaN
4 NaN
5 NaN
6 7.0
7 NaN
8 9.0
9 NaN
10 NaN
11 12.0
12 NaN
13 14.0
14 15.0
masking only "col1"
If you just need to replace col1 and leave potential other columns intact rather use assign
and Series.mask
:
df2 = df0.assign(col1=df0['col1'].mask(df1['col1'].notna()))
CodePudding user response:
Use Series.where
with Series.isna
for replace one column by another DataFrame, only necessary same index in both:
df0['col1'] = df0['col1'].where(df1['col1'].isna())
print (df0)
col1
0 NaN
1 NaN
2 3.0
3 NaN
4 NaN
5 NaN
6 7.0
7 NaN
8 9.0
9 NaN
10 NaN
11 12.0
12 NaN
13 14.0
14 15.0
Alternative with DataFrame.loc
and Series.notna
:
df0.loc[df1['col1'].notna(), 'col1'] = np.nan