Home > Back-end >  Modify value in column based on conditions from another columns pandas
Modify value in column based on conditions from another columns pandas

Time:07-30

I have this example df:

start_date   end_date id
22-07-2022 28-07-2022  A
22-07-2022 28-07-2022  B
22-07-2022 17-08-2022  A

I need to obtain this result:

start_date   end_date id
22-07-2022 28-07-2022  A
22-07-2022 28-07-2022  B
29-07-2022 17-08-2022  A

To put it in words, I need to copy the earliest end_date 1 day from one row with the same id into the another row's start_date, so the dates don't overlap. There will always be 2 rows max per id and the start_date will be the same for both of them.

Right now I'm trying to use a for loop adding conditions but I guess there's a more Pythonic way to do this using pandas functions. Any hint?

Thanks

CodePudding user response:

The following code defines the same data you provide programmatically and does the type of transformation you seek.

import pandas as pd
from datetime import timedelta

data = {
    "start_date": ["22-07-2022", "22-07-2022", "22-07-2022"],
    "end_date": ["28-07-2022", "28-07-2022", "17-08-2022"],
    "id": ["A", "B", "A"],
}

df = (
    pd.DataFrame(data)
    .assign(start_date=lambda x: pd.to_datetime(x.start_date, format="%d-%m-%Y"))
    .assign(end_date=lambda x: pd.to_datetime(x.end_date, format="%d-%m-%Y"))
    .sort_values("end_date")
)


df["new_start_date"] = (
    df.groupby("id").end_date.shift()   timedelta(days=1)
).combine_first(df.start_date)

In the last line, we group by the id variable, and shift each sorted end_date of each id forward, leaving a NaN like in the first row of the group. Then add an extra day to that column to advance the end date to the next day. Finally, we coalesce/combine_first the start_date with this new columns fill the gaps.

  • Related