Home > Back-end >  Return values between index n and last non NaN index in pandas
Return values between index n and last non NaN index in pandas

Time:12-03

I'm trying to return the values of the last N rows that are non NaN in a pandas dataframe, in order to perform some operations. To make it clearer, say I want to compute the mean of the last 3 non NaN rows. First, this is what my dataframe looks like:

           a           b         c  
0   125384.0    0.823381  614109.0  
1   224996.0    0.926270  522097.0  
2   178755.0    0.865229  482718.0 
3   194345.0    0.959907  473543.0 
4   143193.0         NaN  450413.0 
5        NaN         NaN       NaN 

For column a, for example, I want the mean between 178755.0, 194345.0 and 143193.0 (3 last rows before the first NaN). Expected result: 172097.7

So far I tried 2 approaches: first what was proposed in this post, and used pd.Series.last_valid_index as the second input in iloc:

df.iloc[-3: df.apply(pd.Series.last_valid_index)].mean()

Which gave me the following error:

TypeError: cannot do positional indexing on RangeIndex with these indexers

I thought that the problem here was that I was retrieving an index, which made me try retrieving the value itself, and used what was proposed in this answer:

df.iloc[-3: df.apply(lambda x: x[x.notnull()].values[-1])]

This again gave me the same error:

TypeError: cannot do positional indexing on RangeIndex with these indexers

Some insight into what the solution might be would be very much appreciated. I want to do this for all columns in the dataframe, hence using df.apply. Thanks in advance!

CodePudding user response:

Try this:

df.apply(lambda x: x[x.notnull()].iloc[-3:].mean(), axis = 0)

CodePudding user response:

Let us try stack

df.stack().groupby(level=1).apply(lambda x : x.tail(3).mean())
Out[11]: 
a    172097.666667
b         0.917135
c    468891.333333
  • Related