Home > Software design >  Splitting call data to 15 minute intervals in pandas
Splitting call data to 15 minute intervals in pandas

Time:09-29

I am new in python and pandas and even though I researched a lot about intervals, I couldnt find any solution for my problem, I hope someone can help

Here is a sample of my DF

df = pd.DataFrame(
    data=[['Mel Gibson', 'German', '2021-9-23 14:22:38', 301 ],
          ['Jim Carrey', 'German', '2021-9-23 14:27:39', 1041 ],
          ['Mel Gibson', 'German','2021-9-24 13:33:22',12]],
    columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)
df['Interval Start'] = pd.to_datetime(df['Interval Start'])

What I want to do is transform status duration to 15 minute intervals and group them per specialist and per day.

My desired output should be like below:

df = pd.DataFrame(
    data=[['Mel Gibson', 'German', '2021-9-23 14:15:00', 301 ],
          ['Jim Carrey', 'German', '2021-9-23 14:15:00', 141 ],
          ['Jim Carrey', 'German', '2021-9-23 14:30:00', 900 ],
          ['Mel Gibson', 'German','2021-9-24 13:30:00',12]],
    columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)

So basically I need to split the seconds in status duration in 15 minute intervals until there is no remaining duration.

Edit:

My original data is like this:

    df = pd.DataFrame(
            data=[['Mel Gibson', 'German', '2021-9-23 14:22:38', 301 ],
                  ['Mel Gibson', 'German', '2021-9-23 14:27:40', 4678 ],
                  ['Mel Gibson', 'German','2021-9-24 13:33:22',12]],
            columns=['specialist', 'Language', 'Interval Start', 'status_duration']
        )
        df['Interval Start'] = pd.to_datetime(df['Interval Start'])

The code from Henry gives me output for only the first row, second row is skipped.

Also lets say if a call has started at 10:35:00, this interval's(10:30-10:45) can not exceed 600 seconds as there are only 10 minutes left from the start time.

CodePudding user response:

One way is to make use of the quotient and remainder of status_duration, explode the result and finally add up the time by seconds:

ref = (df.groupby(["specialist", "Language", pd.Grouper(key="Interval Start", freq="D")], as_index=False)
         .agg(status_duration=("status_duration", lambda d: [*([900]*(d.iat[0]//900)), d.iat[0]%900]),
              Interval=("Interval Start", "first"))
         .explode("status_duration"))

ref["Interval"] = ref["Interval"].dt.floor("15min") pd.to_timedelta(ref.groupby(ref.index).cumcount()*900, unit="sec")

print (ref)

   specialist Language status_duration            Interval
0  Jim Carrey   German             900 2021-09-23 14:15:00
0  Jim Carrey   German             141 2021-09-23 14:30:00
1  Mel Gibson   German             301 2021-09-23 14:15:00
2  Mel Gibson   German              12 2021-09-24 13:30:00

CodePudding user response:

You can use the dt.floor() function for the rounding:

df['Interval Start'] = df['Interval Start'].dt.floor("15min")

Result:

   specialist Language      Interval Start  status_duration
0  Mel Gibson   German 2021-09-23 14:15:00              301
1  Jim Carrey   German 2021-09-23 14:15:00             1041
2  Mel Gibson   German 2021-09-24 13:30:00               12
  • Related