Home > front end >  Pandas turn last N columns into NA based on another dataframe
Pandas turn last N columns into NA based on another dataframe

Time:10-05

I have the following dataframes:

df1 = pd.DataFrame(data={'col1': ['a', 'd', 'g', 'j'], 
                        'col2': ['b', 'c', 'i', np.nan], 
                        'col3': ['c', 'f', 'i', np.nan],
                        'col4': ['x', np.nan, np.nan, np.nan]},
                index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4'], name='index'))
index col1 col2 col3 col4
ind1 a b c x
ind2 d c f NaN
ind3 g i i NaN
ind4 j NaN NaN NaN
df2 = pd.Series(data=[True, False, True, False],
                index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4']))
ind1 True
ind2 False
ind3 True
ind4 False

How do I make the last 2 values for each row in df1 into NA, based on the boolean values of df2?

In this case, since ind1 and ind3 are True, it would impact the same indices in df1.

index col1 col2 col3 col4
ind1 a b NaN NaN
ind2 d c f NaN
ind3 g i NaN NaN
ind4 j NaN NaN NaN

CodePudding user response:

A possible solution, based on pandas.DataFrame.mask:

df1[['col3', 'col4']] = df1[['col3', 'col4']].mask(df2)

Output:

      col1 col2 col3 col4
index                    
ind1     a    b  NaN  NaN
ind2     d    c    f  NaN
ind3     g    i  NaN  NaN
ind4     j  NaN  NaN  NaN

CodePudding user response:

You can use boolean indexing:

N = 2
df1.iloc[df2, -N:] = np.nan

NB. what you call df2 is actually a Series, s/ser might be more appropriate as a name.

output:

      col1 col2 col3 col4
index                    
ind1     a    b  NaN  NaN
ind2     d    c    f  NaN
ind3     g    i  NaN  NaN
ind4     j  NaN  NaN  NaN
  • Related