Home > database >  In Pandas, how can I perform the .diff() method to numerical values only in a column that also conta
In Pandas, how can I perform the .diff() method to numerical values only in a column that also conta

Time:06-10

I have a Pandas dataset and I would like to calculate the difference of a column element compared with another element of the same column. In order to do so, the most intuitive method to apply is .diff()

So far, so good. The problem is that my column contains nan values without a specific order pattern, like the following example with a column named col:

  | col |
  |-----|
0 |  1  |
1 | NaN |
2 |  3  |
3 |  4  |
4 | NaN |
5 | NaN |
6 | 10  |
7 | NaN |
8 | 13  |

What I would like to do is to apply the .diff() method only to the preceding numerical values of the column, such that the expected answer is:

  | col |
  |-----|
0 | NaN |
1 | NaN |
2 |  2  |
3 |  1  |
4 | NaN |
5 | NaN |
6 |  6  |
7 | NaN |
8 |  3  |

Had it been a periodic order of the nan values, I could have used the periods parameter of the .diff() method, as explained here. However, given that the nan values appear in a random order, I was wondering how this could be done?

CodePudding user response:

You'll need to dropna and set up a temporary variable, and reindex like this:

import numpy as np

df = pd.DataFrame({"col": [1, np.nan, 3, 4, np.nan, np.nan, 10, np.nan, 13]})
idx = df.index  # create index from original data
tmp = df.dropna()  # drop nan rows
tmp.diff().reindex(idx)  # reindex to original index
>>>
  | col |
  |-----|
0 | NaN |
1 | NaN |
2 |  2  |
3 |  1  |
4 | NaN |
5 | NaN |
6 |  6  |
7 | NaN |
8 |  3  |
  • Related