My dataframe inside a column data is [1,1,2,3,4,7,8,8,15,19,20,21]. I want to get the most contiguous data segments in this column: [1,2,3,4]. How to calculate it?
CodePudding user response:
You can create groups by consecutive values by compare difference with cumulative sum, get counts by GroupBy.transform
and last filter maximal counts of original column col
- output are all consecutive values with maximal counts:
s = df['col'].groupby(df['col'].diff().ne(1).cumsum()).transform('size')
out = df.loc[s.eq(s.max()), 'col']
If need first maximum consecutives values use Series.value_counts
with Series.idxmax
:
s = df['col'].diff().ne(1).cumsum()
out = df.loc[s.eq(s.value_counts().idxmax()), 'col']