Incrementally add time to column in dataframe based on first value of the column-CodePudding

I have a scenario where I have to increment the timestamp of a column in my dataframe.

The dataframe consists of a a column which has a set of identical area IDs along with one "waterDuration" column.

I want to successively add this duration to the timestamp given in first row for each type of area and update the rest of the rows for each area ID incrementally.

This is how my dataframe looks like.

First timestamp for each areaId is given, I want to add whatever the duration is given next to it to the initial value and update and increment for the rest such as : -

These are all the columns of my dataframe :-

scheduleId          int64
scheduleName       object
areaId             object
deviceId           object
stationDeviceId    object
evStatus           object
waterDuration      object
noOfCyles          object
startTime1         object
startTime2         object
startTime3         object
startTime4         object
waterPlanning      object
lastUpdatedTime    object
dtype: object

I want all these columns and their values intact in the df along with the updated values in startTime1.

The value of waterDuration can change so I'd prefer not to use it directly in the solution. Any help would be great!!

CodePudding user response：

So here's your dataframe:

data = {
    "areaID": ["125659657", "125659657", "125659657", "125659657", "9876913", "9876913", "9876913", "9876913"],
    "waterDuration": ["15", "15", "15", "15", "10", "10", "10", "10"],
    "startTime1": ["00:04:00", "00:00:00", "00:00:00", "00:00:00", "00:34:00", "00:00:00", "00:00:00", "00:00:00"]
}

df = pd.DataFrame(data)

You will need to change the dtypes:

df.waterDuration = df.waterDuration.astype(int)

In order to get the output you want, create a function to apply to the dataframe:

def add_from_last_row(row):
    # If first row, nothing to do
    # row.name corresponds to the DataFrame index
    if row.name == 0:
        return row.startTime1
    # If prev. row is not the same area, do nothing
    if row.areaID != df.loc[row.name-1, 'areaID']:
        return row.startTime1

    # Get the min index in order to get the original startTime
    min_index = df[df.areaID == row.areaID].index.min()
    # Here we get the original startTime, cast to datetime
    default_time = pd.to_datetime(df.loc[min_index, 'startTime1'], format="%H:%M:%S")
    # Sum all durations from min_index 1 to current row index
    seconds_to_add = df.loc[min_index 1:row.name, 'waterDuration'].sum()
    # Calculate the offset in seconds
    offset = pd.DateOffset(seconds=int(seconds_to_add))

    # return the last 8 character ie. hh:mm:ss
    # otherwise it would be YYYY-MM-DD hh:mm:ss
    return str(default_time   offset)[-8:]

Then apply it:

df.apply(lambda x: add_from_last_row(x), axis=1)

And the result:

0    00:04:00
1    00:04:15
2    00:04:30
3    00:04:45
4    00:34:00
5    00:34:10
6    00:34:20
7    00:34:30
dtype: object

Hope it helps !