I have a Dataset as below :
import pandas as pd
from workdays import workday, networkdays
path = r'C:\Users\user\Documents\GitHub\learning\abc\1\test_labtat\lab.xlsx'
df = pd.read_excel(path)
start date End date HT D
0 2022-02-08 NaT indirect BL
1 2022-01-20 NaT direct None
2 2022-01-23 NaT direct None
3 2022-01-23 NaT direct None
4 2022-02-07 NaT direct None
5 2022-02-07 NaT direct None
6 2022-02-09 NaT direct None
7 2022-02-09 NaT direct None
8 2022-02-10 NaT direct None
9 2022-02-11 2022-02-13 direct None
10 2022-02-16 NaT direct None
11 2022-02-16 NaT direct None
12 2022-02-16 NaT direct None
13 2022-01-15 2022-01-21 direct None
14 2022-01-17 2022-01-17 direct None
I write the code to calculate networkdays for these row have date value in column 'End Date' :
df.loc[df['D']=='BL', 'D'] = df.apply(lambda x: networkdays(x['start date'],x['End date']) if not pd.isnull(x['End date']) else x['End date'],axis=1) #if column'D' value = 'BL' then skip its value , just apply for the rest cell in D with criterias ['End date'], ['Start date'] not null
however, I got the error below, I don't know how I got this, could you please help look ?
my expect output like below:
start date End date HT D
0 2022-02-08 NaT indirect BL
1 2022-01-20 NaT direct None
2 2022-01-23 NaT direct None
3 2022-01-23 NaT direct None
4 2022-02-07 NaT direct None
5 2022-02-07 NaT direct None
6 2022-02-09 NaT direct None
7 2022-02-09 NaT direct None
8 2022-02-10 NaT direct None
9 2022-02-11 2022-02-13 direct 3
10 2022-02-16 NaT direct None
11 2022-02-16 NaT direct None
12 2022-02-16 NaT direct None
13 2022-01-15 2022-01-21 direct 5
14 2022-01-17 2022-01-17 direct 1
CodePudding user response:
I believe the problem comes from how you call the apply
function.
By default, apply
works on columns [1], but you can change that using the axis
parameter.
Something like this might give you the expected result:
df['days'] = df.apply(
lambda x:
networkdays(x['start date'], x['End date'])
if not pd.isnull(x['End date'])
else "can not call"
, axis=1 # use axis=1 to work with rows instead of columns
)