So I have a dataframe like this: df =
index | Date | Value |
---|---|---|
1 | 12/1/2022 | 100 |
2 | 12/2/2022 | 200 |
3 | 12/5/2022 | 500 |
4 | 12/6/2022 | 150 |
I'm trying to create a variable that is the sum of values where the date is within 7 business days of a given date (12/12/2022 in this example).
I was able to do 7 days (not only business days) using this function:
var = df.loc[(df['Date'] - date(2022,12,12)).dt.days.abs() < 7, 'Value'].sum()
Now I have a function that finds the number of business days between two dates:
def date_diff(date_1, date_2):
dates = []
if date_2 > date_1:
delta = (date_2 - date_1).days
start_date = date_1
else:
delta = (date_1 - date_2).days
start_date = date_2
for i in range(delta 1):
date_ = start_date timedelta(i)
if check(date_): # checks if its a business day
dates.append(date_)
return len(dates)
So this function won't take a Pandas Series and I don't know what to change to make it work. I fiddled with different syntaxes I've used like passing the Series with .dt or .values but nothing has worked. I'm assuming there is an easy solution I'm just overlooking, so any help would be greatly appreciated.
CodePudding user response:
You can mask the weekends out adding a second condition to the mask:
var = df.loc[((df['Date'] - date(2022,12,12)).dt.days.abs() < 7) & (df['Date'].apply(lambda x: x.weekday() not in [5,6])), 'Value'].sum()
CodePudding user response:
Here's a way to do what your question asks:
import pandas as pd
df = pd.DataFrame({
'Date':['12/1/2022','12/2/2022','12/5/2022','12/6/2022'],
'Value':[100,200,500,150]
}, index=range(1, 5))
df.Date = pd.to_datetime(df.Date)
print(df)
def addBusDays(dt, n):
new = dt
incr = 1 if n >= 0 else -1
mag = n if n >= 0 else -n
busDays = 0
while busDays < mag:
new = pd.to_timedelta(incr, unit='D')
if len(pd.bdate_range(new, new)): # checks if its a business day
busDays = 1
return new
given = pd.to_datetime('12/12/2022')
var = df.loc[(addBusDays(given, -7) < df['Date']) & (df['Date'] < addBusDays(given, 7)), 'Value'].sum()
print(f'\n{var=}')
Output:
Date Value
1 2022-12-01 100
2 2022-12-02 200
3 2022-12-05 500
4 2022-12-06 150
var=850
Explanation:
- find the bounds of the date range within 7 business days of the given date
- use
loc
to filter only those rows withDate
values that are within the bounds - use
sum
to aggregate theValue
column for qualifying rows.
Note:
- The particular implementation of the
addBusDays
function can be changed for example, see the answers to this SO question for alternative approaches). The key concept in my answer to OP's question is that you can pre-calculate the range of qualifying dates and use vectorized pandas comparison operators to filter the rows.