Splitting call data to 15 minute intervals in pandas-CodePudding

I am new in python and pandas and even though I researched a lot about intervals, I couldnt find any solution for my problem, I hope someone can help

Here is a sample of my DF

df = pd.DataFrame(
    data=[['Mel Gibson', 'German', '2021-9-23 14:22:38', 301 ],
          ['Jim Carrey', 'German', '2021-9-23 14:27:39', 1041 ],
          ['Mel Gibson', 'German','2021-9-24 13:33:22',12]],
    columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)
df['Interval Start'] = pd.to_datetime(df['Interval Start'])

What I want to do is transform status duration to 15 minute intervals and group them per specialist and per day.

My desired output should be like below:

df = pd.DataFrame(
    data=[['Mel Gibson', 'German', '2021-9-23 14:15:00', 301 ],
          ['Jim Carrey', 'German', '2021-9-23 14:15:00', 141 ],
          ['Jim Carrey', 'German', '2021-9-23 14:30:00', 900 ],
          ['Mel Gibson', 'German','2021-9-24 13:30:00',12]],
    columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)

So basically I need to split the seconds in status duration in 15 minute intervals until there is no remaining duration.

Edit:

My original data is like this:

    df = pd.DataFrame(
            data=[['Mel Gibson', 'German', '2021-9-23 14:22:38', 301 ],
                  ['Mel Gibson', 'German', '2021-9-23 14:27:40', 4678 ],
                  ['Mel Gibson', 'German','2021-9-24 13:33:22',12]],
            columns=['specialist', 'Language', 'Interval Start', 'status_duration']
        )
        df['Interval Start'] = pd.to_datetime(df['Interval Start'])

The code from Henry gives me output for only the first row, second row is skipped.

Also lets say if a call has started at 10:35:00, this interval's(10:30-10:45) can not exceed 600 seconds as there are only 10 minutes left from the start time.

CodePudding user response：

One way is to make use of the quotient and remainder of status_duration, explode the result and finally add up the time by seconds:

ref = (df.groupby(["specialist", "Language", pd.Grouper(key="Interval Start", freq="D")], as_index=False)
         .agg(status_duration=("status_duration", lambda d: [*([900]*(d.iat[0]//900)), d.iat[0]%900]),
              Interval=("Interval Start", "first"))
         .explode("status_duration"))

ref["Interval"] = ref["Interval"].dt.floor("15min") pd.to_timedelta(ref.groupby(ref.index).cumcount()*900, unit="sec")

print (ref)

   specialist Language status_duration            Interval
0  Jim Carrey   German             900 2021-09-23 14:15:00
0  Jim Carrey   German             141 2021-09-23 14:30:00
1  Mel Gibson   German             301 2021-09-23 14:15:00
2  Mel Gibson   German              12 2021-09-24 13:30:00

CodePudding user response：

You can use the dt.floor() function for the rounding:

df['Interval Start'] = df['Interval Start'].dt.floor("15min")

Result:

   specialist Language      Interval Start  status_duration
0  Mel Gibson   German 2021-09-23 14:15:00              301
1  Jim Carrey   German 2021-09-23 14:15:00             1041
2  Mel Gibson   German 2021-09-24 13:30:00               12