Home > OS >  How to identify consecutive same values pandas
How to identify consecutive same values pandas

Time:10-03

I have a data fame like that :

Timestamp Value
2021-04-21 14:22:00 0
2021-04-21 14:23:00 0
2021-04-21 14:24:00 0
2021-04-21 14:25:00 1
2021-04-21 14:26:00 1
2021-04-21 14:27:00 0
2021-04-21 14:28:00 1
2021-04-21 14:29:00 1
2021-04-21 14:30:00 1

I wan to get the consecutive same values like that :

Begin Date Another header Consecutive values
2021-04-21 14:25:00 2021-04-21 14:26:00 2
2021-04-21 14:28:00 2021-04-21 14:30:00 3

I tried some solutions with numpy shift, or by using for / while loop, but it don't work ... Also, I saw some topics in stack overflow indeed.

Thanks !

CodePudding user response:

You can use a custom groupby.agg:

# identify rows with value=1
m = df['Value'].eq(1)

# filter, groupby consecutive values
out = (df[m].groupby((~m).cumsum())
             # aggregate with first/min, last/max date, and count
            .agg(**{'Begin Date': ('Timestamp', 'first'),    # or min
                    'Another header': ('Timestamp', 'last'), # or max
                    'Consecutive values': ('Timestamp', 'count')})
      )

output:

                Begin Date       Another header  Consecutive values
Value                                                              
3      2021-04-21 14:25:00  2021-04-21 14:26:00                   2
4      2021-04-21 14:28:00  2021-04-21 14:30:00                   3

CodePudding user response:

You can first group the values by using df.diff(1) and .cumsum():

df['group'] = (df.Value.diff(1) != 0).cumsum()

Now we can use pd.groupby() on this column to create your new dataframe:

out = pd.DataFrame({'Begin Date' : df.groupby('group').Timestamp.first(), 
                    'Another header' : df.groupby('group').Timestamp.last(),
                    'Consecutive values' : df.groupby('group').size(),
                    'Value' : df.groupby('group').Value.first()}).reset_index(drop=True)

Output:

             Begin Date        Another header  Consecutive values  Value
0  2021-04-21 14:22:00   2021-04-21 14:24:00                    3      0
1  2021-04-21 14:25:00   2021-04-21 14:26:00                    2      1
2  2021-04-21 14:27:00   2021-04-21 14:27:00                    1      0
3  2021-04-21 14:28:00   2021-04-21 14:30:00                    3      1

I deliberately show here the complete result (including consecutive zeroes). If you wish to delete those you can filter by doing out[out.Value != 0]

  • Related