convert rows of consecutive numbers in a column into two columns with start and end information (pan-CodePudding

I have a data frame like one below with a score for every index i.

I would like to collapse the information in the first column when the score is identical. The anticipated results would look like this:

start   end   score
5       7      3.0
8       9     11.0
15     15     10.0
30     32     1.0
10     11     8.0
20     22     1.0

CodePudding user response：

You can group by consecutive values and aggregate the find the endpoints. Even works for single groups, where the start and end are the same.

df.groupby(df["score"].ne(df["score"].shift()).cumsum()).agg(
    start=("i", "first"), end=("i", "last"), score=("score", "first")
)


       start  end  score
score
1          5    7    3.0
2          8    9   11.0
3         15   15   10.0
4         30   32    1.0
5         10   11    8.0
6         20   22    1.0

CodePudding user response：

You don't need a magic trick here. for loops will do with the least headaches I would say.