I have a dataframe such as:
df = pd.read_excel(io = "☆ TOTAL_1320 - mydata.xlsx", sheet_name = 'Sheet1', index_col = 0)
df2 = df
EXCLUDE PERSONCONCERNED PC_DK PC_SPOUSE PC_PARENT PC_GRANDPARENT PC_CHILD PC_SIBSHIP PC_RELATIVE PC_LOVER ... WARNEMOTION2 TIME10 WARNEMOTION3 TIME11 WARNEMOTION4 WARNEMOTION4DTL TIME12 WARNSIGN_DTL EVENT_DTL EVENT_DTL_2
14 NaN 1 NaN 1.0 2.0 2.0 2.0 2.0 2.0 2.0 ... 2.0 NaN 1.0 88.0 2.0 NaN NaN NaN *** *** ** **/_x000D_\n*** ****** ** ***** ... NaN
68 NaN 1 NaN 1.0 2.0 2.0 2.0 2.0 2.0
and I made a condition like this:
condition = (df2.PERSONCONCERNED==1) & ((df2.PC_SPOUSE.isnull()==True)
& (df2.PC_PARENT.isnull()==True) & (df2.PC_GRANDPARENT.isnull()==True)
& (df2.PC_CHILD.isnull()==True) & (df2.PC_SIBSHIP.isnull()==True)
& (df2.PC_RELATIVE.isnull()==True) & (df2.PC_LOVER.isnull()==True)
& (df2.PC_FRIEND.isnull()==True) & (df2.PC_STRANGER.isnull()==True)
& (df2.PC_ETC.isnull()==True))
when I do df2[condition]['PC_DK']
I get this:
9089 1.0
20082 NaN
20087 NaN
20090 NaN
20096 NaN
30957 NaN
34196 NaN
81293 NaN
104381 88.0
Name: PC_DK, dtype: float64
I want to replace NaN
values and 1
value to 88
in df2
(which is the original dataframe)
I could replace them in new dataframe by doing
df_condition = df2[condition]['PC_DK']
df_condition = df_condition.replace(np.nan, 88)
df_condition = df_condition.replace(1,88)
but I really don't know how to do it with original dataframe.
CodePudding user response:
Before getting started, first thing is isnull()
itself returns Boolean value so no need to compare it against True
, second thing is as you want to check for null
in multiple columns combined by &
, you can consolidate these in to isnull()
and all()
passing axis parameter. So, the reduced code will look something like this:
# cols is list of columns i.e. cols = ['PC_SPOUSE', 'PC_PARENT', ..., etc.]
condition = df2.PERSONCONCERNED==1 & df2[cols].isnull().all(axis=1)
Now to replace, you can directly use the mask created above, then replace the values using loc
, and as you want to add more condition to replace the values, you can create the extra mask and again combine with &
:
extra_mask = df2['PC_DK'].isna() | df2['PC_DK'].eq(1)
df2[condition & extra_mask, 'PC_DK'] = 88