Let's say I have the following pandas.dataframe:
data
series time_idx value
0 0 0 -0.000000
1 0 1 0.018844
2 0 2 0.028694
3 0 3 0.050784
4 0 4 0.067037
... ... ... ...
3995 9 395 0.973978
3996 9 396 0.944002
3997 9 397 1.001089
3998 9 398 1.132001
3999 9 399 1.169244
4000 rows × 3 columns
I want to test if for each series (0..9) the time indexes are incremented by 1 from row to row and if not where the difference is?
I thought about sorting the dataframe by series and by time_index and then compare to the index mod 400, but it's not a nice solution. Any suggestions?
Thanks
CodePudding user response:
The following is based on what I understand from your question. See if this answers your question. I have to use 'True' instead of Boolean True because the dataframe converts it to the numeric 1.0.
df['IncOne'] = (df.series==df.series.shift())
df['IncOne'] = (
np.where(df.IncOne,
np.where( df.time_idx.eq(df.time_idx.shift() 1),
'True' , df.time_idx-df.time_idx.shift() ),
''))
series | time_idx | value | IncOne | |
---|---|---|---|---|
0 | 0 | 0 | 0 | |
1 | 0 | 1 | 0.018844 | True |
2 | 0 | 2 | 0.028694 | True |
3 | 0 | 3 | 0.050784 | True |
4 | 0 | 4 | 0.067037 | True |
5 | 0 | 6 | 0 | 2.0 |
6 | 0 | 7 | 0.018844 | True |
7 | 0 | 8 | 0.028694 | True |
8 | 0 | 9 | 0.050784 | True |
9 | 0 | 12 | 0.067037 | 3.0 |
10 | 0 | 13 | 1 | True |
11 | 9 | 395 | 0.973978 | |
12 | 9 | 396 | 0.944002 | True |
13 | 9 | 397 | 1.00109 | True |
14 | 9 | 398 | 1.132 | True |
15 | 9 | 399 | 1.16924 | True |
CodePudding user response:
Assuming that the dataframe is df
you can try this:
df["diff"] = df.groupby(by="series")["time_idx"].diff().fillna(1) != 1
It will create a new column "diff"
with boolean values. A True
value indicates that the difference between the time_idx
value in the current row and the one preceding it is different than one. Only differences between rows corresponding to the same series can give a True
value.