I have a Pandas dataset and I would like to calculate the difference of a column element compared with another element of the same column. In order to do so, the most intuitive method to apply is .diff()
So far, so good. The problem is that my column contains nan
values without a specific order pattern, like the following example with a column named col
:
| col |
|-----|
0 | 1 |
1 | NaN |
2 | 3 |
3 | 4 |
4 | NaN |
5 | NaN |
6 | 10 |
7 | NaN |
8 | 13 |
What I would like to do is to apply the .diff()
method only to the preceding numerical values of the column, such that the expected answer is:
| col |
|-----|
0 | NaN |
1 | NaN |
2 | 2 |
3 | 1 |
4 | NaN |
5 | NaN |
6 | 6 |
7 | NaN |
8 | 3 |
Had it been a periodic order of the nan
values, I could have used the periods
parameter of the .diff()
method, as explained here. However, given that the nan
values appear in a random order, I was wondering how this could be done?
CodePudding user response:
You'll need to dropna
and set up a temporary variable, and reindex
like this:
import numpy as np
df = pd.DataFrame({"col": [1, np.nan, 3, 4, np.nan, np.nan, 10, np.nan, 13]})
idx = df.index # create index from original data
tmp = df.dropna() # drop nan rows
tmp.diff().reindex(idx) # reindex to original index
>>>
| col |
|-----|
0 | NaN |
1 | NaN |
2 | 2 |
3 | 1 |
4 | NaN |
5 | NaN |
6 | 6 |
7 | NaN |
8 | 3 |