Home > Software design >  Pandas: Calculate time in minutes between 2 columns, excluding weekends, public holidays and taking
Pandas: Calculate time in minutes between 2 columns, excluding weekends, public holidays and taking

Time:03-25

I have the below issue and I feel I'm just a few steps away from solving it, but I'm not experienced enough just yet. I've used business-duration for this. I've looked through other similar answers to this and tried many methods, but this is the closest I have gotten (Using this answer). I'm using Anaconda and Spyder, which is the only method I have on my work laptop at the moment. I can't install some of the custom Business days functions into anaconda.

I have a large dataset (~200k rows) which I need to solve this for:

import pandas as pd
import business_duration as bd
import datetime as dt
import holidays as pyholidays

#Specify Business Working hours (8am - 5pm)
Bus_start_time = dt.time(8,00,0)
Bus_end_time = dt.time(17,0,0)

holidaylist = pyholidays.ZA()
unit='min'

list = [[10, '2022-01-01 07:00:00', '2022-01-08 15:00:00'], [11, '2022-01-02 18:00:00', '2022-01-10 15:30:00'],
[12, '2022-01-01 09:15:00', '2022-01-08 12:00:00'], [13, '2022-01-07 13:00:00', '2022-01-23 17:00:00']]

df = pd.DataFrame(list, columns =['ID', 'Start', 'End'])
print(df)

Which gives:

   ID      Start                  End 
0  10  2022-01-01 07:00:00  2022-01-08 15:00:00 
1  11  2022-01-02 18:00:00  2022-01-10 15:30:00 
2  12  2022-01-01 09:15:00  2022-01-08 12:00:00 
3  13  2022-01-07 13:00:00  2022-01-23 17:00:00

The next step works in testing single dates:

startdate = pd.to_datetime('2022-01-01 00:00:00')
enddate = pd.to_datetime('2022-01-14 23:00:00')

df['TimeAdj'] = bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit)
print(df)

Which results in:

   ID        Start                  End            TimeAdj 
0  10  2022-01-01 07:00:00  2022-01-08 15:00:00    5400.0 
1  11  2022-01-02 18:00:00  2022-01-10 15:30:00    5400.0 
2  12  2022-01-01 09:15:00  2022-01-08 12:00:00    5400.0 
3  13  2022-01-07 13:00:00  2022-01-23 17:00:00    5400.0

For some reason I have float values showing up, but I can fix that later. Next, I need to have this calculation run per row in the dataframe.

I tried replacing the df columns in start date and end date, but got an error:

startdate = df['Start']
enddate = df['End']

print(bd.businessDuration(startdate,enddate,Bus_start_time,Bus_end_time,holidaylist=holidaylist,unit=unit))`

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I then checked the documentation for business-duration, and adjusted to the below:

from itertools import repeat

df['TimeAdj'] = list(map(bd.businessDuration,startdate,enddate,repeat(Bus_start_time),repeat(Bus_end_time),repeat(holidaylist),repeat(unit)))

AttributeError: 'str' object has no attribute 'date'

I'm hoping to end with the correct values in each row of the TimeAdj column (example figures added).

   ID        Start                  End           TimeAdj 
0  10  2022-01-01 07:00:00  2022-01-08 15:00:00    2300 
1  11  2022-01-02 18:00:00  2022-01-10 15:30:00    2830 
2  12  2022-01-01 09:15:00  2022-01-08 12:00:00    2115 
3  13  2022-01-07 13:00:00  2022-01-23 17:00:00    4800

What do I need to adjust on this?

CodePudding user response:

Use:

from functools import partial

# Convert strings to datetime
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])

# Get holidays list
years = range(df['Start'].min().year, df['End'].max().year 1)
holidaylist = pyholidays.ZA(years=years).keys()

# Create a partial function as a shortcut
bduration = partial(bd.businessDuration,
                    starttime=Bus_start_time, endtime=Bus_end_time,
                    holidaylist=holidaylist, unit=unit)

# Compute business duration
df['TimeAdj'] = df.apply(lambda x: bduration(x['Start'], x['End']), axis=1)

Output:

>>> df
   ID               Start                 End  TimeAdj
0  10 2022-01-01 07:00:00 2022-01-08 15:00:00   2700.0
1  11 2022-01-02 18:00:00 2022-01-10 15:30:00   3150.0
2  12 2022-01-01 09:15:00 2022-01-08 12:00:00   2700.0
3  13 2022-01-07 13:00:00 2022-01-23 17:00:00   5640.0
  • Related