How to merge next rows' start with current rows' end in Python-CodePudding

For example if I have a DataFrame that looks like this

For the same id and Category, I would like to keep only the first start and last end number and eliminate the middle ones. For example, for row 0 and 1, since their id are both A and category are both Cat_1, the start would be 1 and end would be 3. The expected output would look like this:

Feel free to use the following code to explore:

import pandas as pd
data = {'id':  ['A','A','A', 'B', 'B', 'C' , 'D'],
        'start': [1,2,3,4,5,6,7],
        'end': [2,3,4,5,6,7,8],
        'Category':['Cat_1', 'Cat_1', 'Cat_2' , 'Cat_3', 'Cat_3', 'Cat_3', 'Cat_3']
        }

df = pd.DataFrame(data)

CodePudding user response：

You could use groupby agg where you call first on "start" and last on "end":

out = df.groupby(['id','Category'], as_index=False).agg({'start':'first', 'end':'last'})

Output:

  id Category  start  end
0  A    Cat_1      1    3
1  A    Cat_2      3    4
2  B    Cat_3      4    6
3  C    Cat_3      6    7
4  D    Cat_3      7    8