Home > OS >  Setting the last n non NaN vale per group with nan
Setting the last n non NaN vale per group with nan

Time:08-25

I have a DataFrame with (several) grouping variables and (several) value variables. My goal is to set the last n non nan values to nan. So let's take a simple example:

df = pd.DataFrame({'id':[1,1,1,2,2,],
                  'value':[1,2,np.nan, 9,8]})
df

Out[1]:
id  value
0   1   1.0
1   1   2.0
2   1   NaN
3   2   9.0
4   2   8.0

The desired result for n=1 would look the following:

Out[53]:
id  value
0   1   1.0
1   1   NaN
2   1   NaN
3   2   9.0
4   2   NaN

CodePudding user response:

You can check cumsum after groupby get how many notna value per-row

df['value'].where(df['value'].notna().iloc[::-1].groupby(df['id']).cumsum()>1,inplace=True)
df
Out[86]: 
   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN

CodePudding user response:

Use with groupby().cumcount():

N=1
groups = df.loc[df['value'].notna()].groupby('id')
enum = groups.cumcount()
sizes = groups['value'].transform('size')

df['value'] = df['value'].where(enum < sizes - N)

Output:

   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN

CodePudding user response:

One option: create a reversed cumcount on the non-NA values:

N = 1
m = (df
 .loc[df['value'].notna()]
 .groupby('id')
 .cumcount(ascending=False)
 .lt(N)
)
df.loc[m[m].index, 'value'] = np.nan

Similar approach with boolean masking:

m = df['value'].notna()
df['value'] = df['value'].mask(m[::-1].groupby(df['id']).cumsum().le(N))

output:

   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN
  • Related