Home > Software engineering >  Pandas loc() with custom date function
Pandas loc() with custom date function

Time:12-14

So I have a dataframe like this: df =

index Date Value
1 12/1/2022 100
2 12/2/2022 200
3 12/5/2022 500
4 12/6/2022 150

I'm trying to create a variable that is the sum of values where the date is within 7 business days of a given date (12/12/2022 in this example).

I was able to do 7 days (not only business days) using this function:

var = df.loc[(df['Date'] - date(2022,12,12)).dt.days.abs() < 7, 'Value'].sum()

Now I have a function that finds the number of business days between two dates:

def date_diff(date_1, date_2):
    dates = []
    if date_2 > date_1:
        delta = (date_2 - date_1).days
        start_date = date_1
    else:
        delta = (date_1 - date_2).days
        start_date = date_2
    for i in range(delta   1):
        date_ = start_date   timedelta(i)
        if check(date_): # checks if its a business day
            dates.append(date_)
    return len(dates)

So this function won't take a Pandas Series and I don't know what to change to make it work. I fiddled with different syntaxes I've used like passing the Series with .dt or .values but nothing has worked. I'm assuming there is an easy solution I'm just overlooking, so any help would be greatly appreciated.

CodePudding user response:

You can mask the weekends out adding a second condition to the mask:

var = df.loc[((df['Date'] - date(2022,12,12)).dt.days.abs() < 7) & (df['Date'].apply(lambda x: x.weekday() not in [5,6])), 'Value'].sum()

CodePudding user response:

Here's a way to do what your question asks:

import pandas as pd
df = pd.DataFrame({
'Date':['12/1/2022','12/2/2022','12/5/2022','12/6/2022'],
'Value':[100,200,500,150]
}, index=range(1, 5))
df.Date = pd.to_datetime(df.Date)
print(df)

def addBusDays(dt, n):
    new = dt
    incr = 1 if n >= 0 else -1
    mag = n if n >= 0 else -n
    busDays = 0
    while busDays < mag:
        new  = pd.to_timedelta(incr, unit='D')
        if len(pd.bdate_range(new, new)): # checks if its a business day
            busDays  = 1
    return new

given = pd.to_datetime('12/12/2022')
var = df.loc[(addBusDays(given, -7) < df['Date']) & (df['Date'] < addBusDays(given, 7)), 'Value'].sum()
print(f'\n{var=}')

Output:

        Date  Value
1 2022-12-01    100
2 2022-12-02    200
3 2022-12-05    500
4 2022-12-06    150

var=850

Explanation:

  • find the bounds of the date range within 7 business days of the given date
  • use loc to filter only those rows with Date values that are within the bounds
  • use sum to aggregate the Value column for qualifying rows.

Note:

  • The particular implementation of the addBusDays function can be changed for example, see the answers to this SO question for alternative approaches). The key concept in my answer to OP's question is that you can pre-calculate the range of qualifying dates and use vectorized pandas comparison operators to filter the rows.
  • Related