I'm trying to include a third column in my dataset to characterize whether the start_date date is a weekday or not.
So I have the following dataset:
start_date | count |
---|---|
2018-10-01 | 1043 |
2018-10-02 | 1062 |
2018-10-03 | 1068 |
2018-10-04 | 1003 |
2018-10-05 | 1122 |
2021-12-27 | 1053 |
And used this code below to generate this third column
from bdateutil import isbday
import holidays
df1['business_day']=isbday(df1["start_date"], holidays=holidays.US())
But I'm getting the following error:
TypeError: Can't convert <class 'pandas.core.series.Series'> to date.
I've already tried the following codes to adjust the start_date format but I still can't get it to work.
df1['start_date'] = pd.to_datetime(df1['start_date']).dt.date
df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1188 entries, 0 to 1187
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 start_date 1188 non-null object
1 count 1188 non-null int64
dtypes: int64(1), object(1)
and this one:
df1['start_date'] = pd.to_datetime(df1['start_date'])
df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1188 entries, 0 to 1187
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 start_date 1188 non-null datetime64[ns]
1 count 1188 non-null int64
dtypes: datetime64[ns](1), int64(1)
And I still get the same error.
CodePudding user response:
The error message you received is giving you a big clue.
TypeError: Can't convert <class 'pandas.core.series.Series'> to date.
It tells you that the function raising the error, which you should be able to find in the stacktrace, expects certain kinds of data types and pandas.core.series.Series
is not one of them. My guess is the function from bdateutil import isbday
expects certain kinds of values, perhaps a string representation of a date. It is worth noting that bdateutil has not been updated since 2014 and the repo is not available on Pypi which indicates the repo may not be up to date.
If you are able, load everything in an iPython session and issue df1["start_date"]
at the REPL. It returns a Series object not a single object. If you would like to operate on each value in the series, you need to loop over the rows in the dataframe. There are a few ways to do this. One option is calling apply()
with an anonymous function (also called a lambda function) and assigning the output of that function to a new column.
df1['business_day'] = df1["start_date"].apply(lambda x: isbday(x, holidays=holidays.US()))
You could also iterate the rows of the Dataframe, build a new series object and merge or concatenate the new object to the Dataframe. One thing you do not want to do, is change the value you are iterating over. Doing so is bad practice. I hope that helps solve your problem and give you a better grasp of working with panda's dataframe objects.
CodePudding user response:
bday takes a date as a parameter, not a Series.
from bdateutil import isbday
import holidays
df1['business_day'] = df1["start_date"].apply(lambda x: isbday(x, holidays=holidays.US()))