Home > other >  How do I merge near rows based on column values in pandas?
How do I merge near rows based on column values in pandas?

Time:10-05

Imagine that I have a table like this

Start (day) End (day) Duration (days) Status Count
1 2 1 1 3
2 4 2 2 4
4 5 1 3 2
5 8 3 2 1
8 9 1 3 6
9 15 6 3 1
15 16 1 3 2

I want to merge near rows that has the same status. I want to let the start day being the start day of the first row of the sequence and the end day being the one of the last row. I also need to have the duration and the count summed, so in this example, I would want this result with the three last rows merged.

Start (day) End (day) Duration (days) Status Count
1 2 1 1 3
2 4 2 2 4
4 5 1 3 2
5 8 3 2 1
8 16 8 3 9

How should I do this?

Please, help me.

CodePudding user response:

One way using cumsum trick with pandas.DataFrame.groupby.agg:

m = df["Status"].ne(df["Status"].shift()).cumsum()

new_df = df.groupby(m, as_index=False).agg({"Start (day)": "first", 
                                            "End (day)": "last",
                                            "Duration (days)": "sum", 
                                            "Status": "first", 
                                            "Count": "sum"})

Output:

   Start (day)  End (day)  Duration (days)  Status  Count
0            1          2                1       1      3
1            2          4                2       2      4
2            4          5                1       3      2
3            5          8                3       2      1
4            8         16                8       3      9
  • Related