I have a data fame like that :
Timestamp | Value |
---|---|
2021-04-21 14:22:00 | 0 |
2021-04-21 14:23:00 | 0 |
2021-04-21 14:24:00 | 0 |
2021-04-21 14:25:00 | 1 |
2021-04-21 14:26:00 | 1 |
2021-04-21 14:27:00 | 0 |
2021-04-21 14:28:00 | 1 |
2021-04-21 14:29:00 | 1 |
2021-04-21 14:30:00 | 1 |
I wan to get the consecutive same values like that :
Begin Date | Another header | Consecutive values |
---|---|---|
2021-04-21 14:25:00 | 2021-04-21 14:26:00 | 2 |
2021-04-21 14:28:00 | 2021-04-21 14:30:00 | 3 |
I tried some solutions with numpy shift, or by using for / while loop, but it don't work ... Also, I saw some topics in stack overflow indeed.
Thanks !
CodePudding user response:
You can use a custom groupby.agg
:
# identify rows with value=1
m = df['Value'].eq(1)
# filter, groupby consecutive values
out = (df[m].groupby((~m).cumsum())
# aggregate with first/min, last/max date, and count
.agg(**{'Begin Date': ('Timestamp', 'first'), # or min
'Another header': ('Timestamp', 'last'), # or max
'Consecutive values': ('Timestamp', 'count')})
)
output:
Begin Date Another header Consecutive values
Value
3 2021-04-21 14:25:00 2021-04-21 14:26:00 2
4 2021-04-21 14:28:00 2021-04-21 14:30:00 3
CodePudding user response:
You can first group the values by using df.diff(1)
and .cumsum()
:
df['group'] = (df.Value.diff(1) != 0).cumsum()
Now we can use pd.groupby()
on this column to create your new dataframe:
out = pd.DataFrame({'Begin Date' : df.groupby('group').Timestamp.first(),
'Another header' : df.groupby('group').Timestamp.last(),
'Consecutive values' : df.groupby('group').size(),
'Value' : df.groupby('group').Value.first()}).reset_index(drop=True)
Output:
Begin Date Another header Consecutive values Value
0 2021-04-21 14:22:00 2021-04-21 14:24:00 3 0
1 2021-04-21 14:25:00 2021-04-21 14:26:00 2 1
2 2021-04-21 14:27:00 2021-04-21 14:27:00 1 0
3 2021-04-21 14:28:00 2021-04-21 14:30:00 3 1
I deliberately show here the complete result (including consecutive zeroes). If you wish to delete those you can filter by doing out[out.Value != 0]