Home > OS >  How can I repalce values only if conditioin is True?
How can I repalce values only if conditioin is True?

Time:09-26

I have a dataframe such as:

df = pd.read_excel(io = "☆ TOTAL_1320 - mydata.xlsx", sheet_name = 'Sheet1', index_col = 0)
df2 = df

    EXCLUDE PERSONCONCERNED PC_DK   PC_SPOUSE   PC_PARENT   PC_GRANDPARENT  PC_CHILD    PC_SIBSHIP  PC_RELATIVE PC_LOVER    ... WARNEMOTION2    TIME10  WARNEMOTION3    TIME11  WARNEMOTION4    WARNEMOTION4DTL TIME12  WARNSIGN_DTL    EVENT_DTL   EVENT_DTL_2
14  NaN 1   NaN 1.0 2.0 2.0 2.0 2.0 2.0 2.0 ... 2.0 NaN 1.0 88.0    2.0 NaN NaN NaN *** *** ** **/_x000D_\n*** ****** ** ***** ...  NaN
68  NaN 1   NaN 1.0 2.0 2.0 2.0 2.0 2.0

and I made a condition like this:

condition = (df2.PERSONCONCERNED==1) & ((df2.PC_SPOUSE.isnull()==True)
                                    & (df2.PC_PARENT.isnull()==True) & (df2.PC_GRANDPARENT.isnull()==True) 
                                    & (df2.PC_CHILD.isnull()==True) & (df2.PC_SIBSHIP.isnull()==True) 
                                    & (df2.PC_RELATIVE.isnull()==True) & (df2.PC_LOVER.isnull()==True) 
                                    & (df2.PC_FRIEND.isnull()==True) & (df2.PC_STRANGER.isnull()==True) 
                                    & (df2.PC_ETC.isnull()==True))

when I do df2[condition]['PC_DK'] I get this:

9089       1.0
20082      NaN
20087      NaN
20090      NaN
20096      NaN
30957      NaN
34196      NaN
81293      NaN
104381    88.0
Name: PC_DK, dtype: float64

I want to replace NaN values and 1 value to 88 in df2(which is the original dataframe)

I could replace them in new dataframe by doing

df_condition = df2[condition]['PC_DK']

df_condition = df_condition.replace(np.nan, 88)
df_condition = df_condition.replace(1,88) 

but I really don't know how to do it with original dataframe.

CodePudding user response:

Before getting started, first thing is isnull() itself returns Boolean value so no need to compare it against True , second thing is as you want to check for null in multiple columns combined by & , you can consolidate these in to isnull() and all() passing axis parameter. So, the reduced code will look something like this:

# cols is list of columns i.e. cols = ['PC_SPOUSE', 'PC_PARENT', ..., etc.]

condition = df2.PERSONCONCERNED==1 & df2[cols].isnull().all(axis=1)

Now to replace, you can directly use the mask created above, then replace the values using loc, and as you want to add more condition to replace the values, you can create the extra mask and again combine with &:

extra_mask = df2['PC_DK'].isna() | df2['PC_DK'].eq(1)
df2[condition & extra_mask, 'PC_DK'] = 88
  • Related