Calculate the time difference with text-CodePudding

I want to calculate TimeStamp difference but the content has text NaN ,Can it be done?

Source data

         TimeStamp 
    81        1285  
    82        1285  
    83         NaN  
    84        1287  
    85        1287  
    86        1287  
    87         NaN  
    88        1294  
    89        1294  
    90        1294

expected

         TimeStamp    diff
    81        1285     NaN
    82        1285       0
    83         NaN     NaN
    84        1287     NaN
    85        1287       0
    86        1287       0
    87         NaN     NaN
    88        1294     NaN
    89        1294       0
    90        1294       0

I have another question add ID column ,calculate the time difference based on the ID category

Source data

         TimeStamp  ID 
    81        1285   0
    82        1285   1
    83         NaN   0
    84        1287   1
    85        1287   0
    86        1287   1
    87         NaN   0
    88        1294   1
    89        1294   0
    90        1294   1

expected

         TimeStamp  ID   diff
    81        1285   0    NaN
    82        1285   1    NaN
    83         NaN   0    NaN
    84        1287   1      2
    85        1287   0    NaN
    86        1287   1      0
    87         NaN   0    NaN
    88        1294   1      7
    89        1294   0    NaN
    90        1294   1      0

CodePudding user response：

If your column or NaNs are strings, simply convert to float:

df['diff'] = df['TimeStamp'].astype(float).diff()

Output:

    TimeStamp  diff
81       1285   NaN
82       1285   0.0
83        NaN   NaN
84       1287   NaN
85       1287   0.0
86       1287   0.0
87        NaN   NaN
88       1294   NaN
89       1294   0.0
90       1294   0.0

CodePudding user response：

The reason you end up with NaNs in the resultant series is that subtraction with an undefined value is likewise undefined. It's up to you to decide how you want to handle that.

If you want to replace all the NaNs in the original series with 0, you can do that

>>> s.fillna(0).diff()
81       NaN
82       0.0
83   -1285.0
84    1287.0
85       0.0
86       0.0
87   -1287.0
88    1294.0
89       0.0
90       0.0
Name: TimeStamp, dtype: float64

fillna returns a new series where all the NaNs are replaced with 0.

CodePudding user response：

Let us try groupby

df['new'] = df.groupby(df.TimeStamp.isna().cumsum())['TimeStamp'].diff()
df
    TimeStamp  new
81     1285.0  NaN
82     1285.0  0.0
83        NaN  NaN
84     1287.0  NaN
85     1287.0  0.0
86     1287.0  0.0
87        NaN  NaN
88     1294.0  NaN
89     1294.0  0.0
90     1294.0  0.0