Home > Net >  Count preceding non NaN values in pandas
Count preceding non NaN values in pandas

Time:03-18

I have a DataFrame that looks like the following:

     a   b   c
0   NaN  8  NaN
1   NaN  7  NaN
2   NaN  5  NaN
3   7.0  3  NaN
4   3.0  5  NaN
5   5.0  4  NaN
6   7.0  1  NaN
7   8.0  9  3.0
8   NaN  5  5.0
9   NaN  6  4.0

What I want to create is a new DataFrame where each value contains the sum of all non-NaN values before it in the same column. The resulting new DataFrame would look like this:

     a   b   c
0   0    1  0
1   0    2  0
2   0    3  0
3   1    4  0
4   2    5  0
5   3    6  0
6   4    7  0
7   5    8  1
8   5    9  2
9   5   10  3

I have achieved it with the following code:

for i in range(len(df)):
  df.iloc[i] = df.iloc[0:i].isna().sum()

However, I can only do so with an individual column. My real DataFrame contains thousands of columns so iterating between them is impossible due to the low processing speed. What can I do? Maybe it should be something related to using the pandas .apply() function.

CodePudding user response:

There's no need for apply. It can be done much more efficiently using notna cumsum (notna for the non-NaN values and cumsum for the counts):

out = df.notna().cumsum()

Output:

   a   b  c
0  0   1  0
1  0   2  0
2  0   3  0
3  1   4  0
4  2   5  0
5  3   6  0
6  4   7  0
7  5   8  1
8  5   9  2
9  5  10  3

CodePudding user response:

Check with notna with cumsum

out = df.notna().cumsum()
Out[220]: 
   a   b  c
0  0   1  0
1  0   2  0
2  0   3  0
3  1   4  0
4  2   5  0
5  3   6  0
6  4   7  0
7  5   8  1
8  5   9  2
9  5  10  3
  • Related