I'm trying to return the values of the last N rows that are non NaN in a pandas dataframe, in order to perform some operations. To make it clearer, say I want to compute the mean of the last 3 non NaN rows. First, this is what my dataframe looks like:
a b c
0 125384.0 0.823381 614109.0
1 224996.0 0.926270 522097.0
2 178755.0 0.865229 482718.0
3 194345.0 0.959907 473543.0
4 143193.0 NaN 450413.0
5 NaN NaN NaN
For column a
, for example, I want the mean between 178755.0
, 194345.0
and 143193.0
(3 last rows before the first NaN). Expected result: 172097.7
So far I tried 2 approaches: first what was proposed in this post, and used pd.Series.last_valid_index
as the second input in iloc
:
df.iloc[-3: df.apply(pd.Series.last_valid_index)].mean()
Which gave me the following error:
TypeError: cannot do positional indexing on RangeIndex with these indexers
I thought that the problem here was that I was retrieving an index, which made me try retrieving the value itself, and used what was proposed in this answer:
df.iloc[-3: df.apply(lambda x: x[x.notnull()].values[-1])]
This again gave me the same error:
TypeError: cannot do positional indexing on RangeIndex with these indexers
Some insight into what the solution might be would be very much appreciated. I want to do this for all columns in the dataframe, hence using df.apply
. Thanks in advance!
CodePudding user response:
Try this:
df.apply(lambda x: x[x.notnull()].iloc[-3:].mean(), axis = 0)
CodePudding user response:
Let us try stack
df.stack().groupby(level=1).apply(lambda x : x.tail(3).mean())
Out[11]:
a 172097.666667
b 0.917135
c 468891.333333