How do I merge near rows based on column values in pandas?-CodePudding

Imagine that I have a table like this

Start (day)	End (day)	Duration (days)	Status	Count
1	2	1	1	3
2	4	2	2	4
4	5	1	3	2
5	8	3	2	1
8	9	1	3	6
9	15	6	3	1
15	16	1	3	2

I want to merge near rows that has the same status. I want to let the start day being the start day of the first row of the sequence and the end day being the one of the last row. I also need to have the duration and the count summed, so in this example, I would want this result with the three last rows merged.

Start (day)	End (day)	Duration (days)	Status	Count
1	2	1	1	3
2	4	2	2	4
4	5	1	3	2
5	8	3	2	1
8	16	8	3	9

How should I do this?

Please, help me.

CodePudding user response：

One way using cumsum trick with pandas.DataFrame.groupby.agg:

m = df["Status"].ne(df["Status"].shift()).cumsum()

new_df = df.groupby(m, as_index=False).agg({"Start (day)": "first", 
                                            "End (day)": "last",
                                            "Duration (days)": "sum", 
                                            "Status": "first", 
                                            "Count": "sum"})

Output:

   Start (day)  End (day)  Duration (days)  Status  Count
0            1          2                1       1      3
1            2          4                2       2      4
2            4          5                1       3      2
3            5          8                3       2      1
4            8         16                8       3      9