I want to calculate TimeStamp difference but the content has text NaN ,Can it be done?
Source data
TimeStamp
81 1285
82 1285
83 NaN
84 1287
85 1287
86 1287
87 NaN
88 1294
89 1294
90 1294
expected
TimeStamp diff
81 1285 NaN
82 1285 0
83 NaN NaN
84 1287 NaN
85 1287 0
86 1287 0
87 NaN NaN
88 1294 NaN
89 1294 0
90 1294 0
I have another question add ID column ,calculate the time difference based on the ID category
Source data
TimeStamp ID
81 1285 0
82 1285 1
83 NaN 0
84 1287 1
85 1287 0
86 1287 1
87 NaN 0
88 1294 1
89 1294 0
90 1294 1
expected
TimeStamp ID diff
81 1285 0 NaN
82 1285 1 NaN
83 NaN 0 NaN
84 1287 1 2
85 1287 0 NaN
86 1287 1 0
87 NaN 0 NaN
88 1294 1 7
89 1294 0 NaN
90 1294 1 0
CodePudding user response:
If your column or NaNs are strings, simply convert to float:
df['diff'] = df['TimeStamp'].astype(float).diff()
Output:
TimeStamp diff
81 1285 NaN
82 1285 0.0
83 NaN NaN
84 1287 NaN
85 1287 0.0
86 1287 0.0
87 NaN NaN
88 1294 NaN
89 1294 0.0
90 1294 0.0
CodePudding user response:
The reason you end up with NaNs in the resultant series is that subtraction with an undefined value is likewise undefined. It's up to you to decide how you want to handle that.
If you want to replace all the NaNs in the original series with 0
, you can do that
>>> s.fillna(0).diff()
81 NaN
82 0.0
83 -1285.0
84 1287.0
85 0.0
86 0.0
87 -1287.0
88 1294.0
89 0.0
90 0.0
Name: TimeStamp, dtype: float64
fillna
returns a new series where all the NaNs are replaced with 0
.
CodePudding user response:
Let us try groupby
df['new'] = df.groupby(df.TimeStamp.isna().cumsum())['TimeStamp'].diff()
df
TimeStamp new
81 1285.0 NaN
82 1285.0 0.0
83 NaN NaN
84 1287.0 NaN
85 1287.0 0.0
86 1287.0 0.0
87 NaN NaN
88 1294.0 NaN
89 1294.0 0.0
90 1294.0 0.0