Home > Software design >  How to verify if each element in a pandas columns increase strictly by 1 from row to row
How to verify if each element in a pandas columns increase strictly by 1 from row to row

Time:10-27

Let's say I have the following pandas.dataframe:

data

      series    time_idx    value
0        0          0      -0.000000
1        0          1       0.018844
2        0          2       0.028694
3        0          3       0.050784
4        0          4       0.067037
...      ...        ...     ...
3995     9          395     0.973978
3996     9          396     0.944002
3997     9          397     1.001089
3998     9          398     1.132001
3999     9          399     1.169244
4000 rows × 3 columns

I want to test if for each series (0..9) the time indexes are incremented by 1 from row to row and if not where the difference is?

I thought about sorting the dataframe by series and by time_index and then compare to the index mod 400, but it's not a nice solution. Any suggestions?

Thanks

CodePudding user response:

The following is based on what I understand from your question. See if this answers your question. I have to use 'True' instead of Boolean True because the dataframe converts it to the numeric 1.0.

df['IncOne'] = (df.series==df.series.shift())
df['IncOne'] = (
    np.where(df.IncOne, 
        np.where( df.time_idx.eq(df.time_idx.shift() 1), 
                  'True' , df.time_idx-df.time_idx.shift() ),
    ''))
series time_idx value IncOne
0 0 0 0
1 0 1 0.018844 True
2 0 2 0.028694 True
3 0 3 0.050784 True
4 0 4 0.067037 True
5 0 6 0 2.0
6 0 7 0.018844 True
7 0 8 0.028694 True
8 0 9 0.050784 True
9 0 12 0.067037 3.0
10 0 13 1 True
11 9 395 0.973978
12 9 396 0.944002 True
13 9 397 1.00109 True
14 9 398 1.132 True
15 9 399 1.16924 True

CodePudding user response:

Assuming that the dataframe is df you can try this:

df["diff"] = df.groupby(by="series")["time_idx"].diff().fillna(1) != 1

It will create a new column "diff" with boolean values. A True value indicates that the difference between the time_idx value in the current row and the one preceding it is different than one. Only differences between rows corresponding to the same series can give a True value.

  • Related