I have a dataframe that contains 3 datetime columns
ItemUid HireStart DCompleteDate OffHire
14055 2021-01-01 2021-12-17 2021-01-09
14065 2021-08-12 2021-12-17 2021-11-17
14534 2018-12-21 NaT NaT
11639 NaT NaT NaT
43268 2020-09-07 2020-09-03 2020-11-03
36723 2021-01-03 Nat 2021-01-10
I am trying to return a dataframe that returns the items that were on hire between a user inputted date range.
Ie: if the user inputs: start date = '2021-01-02' & end date = '2021-01-08' the expected result would be:
ItemUid HireStart DCompleteDate OffHire
14055 2021-01-01 2021-01-23 2021-01-09
14534 2018-12-21 NaT NaT
36723 2021-01-03 Nat 2021-01-10
My code :)
def date_range(df):
start_date = input("Enter start date dd/mm/yyyy: ")
end_date = input("Enter end date dd/mm/yyyy: ")
df = df[(df['OffHire'] <= end_date) &
((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
return df
result = df_hire.apply(date_range, axis=1)
This is currently getting an error:
TypeError Traceback (most recent call last)
<ipython-input-60-6d4d17020cba> in <module>()
9 return df
10
---> 11 result = df_hire.apply(date_range, axis=1)
4 frames
<ipython-input-60-6d4d17020cba> in date_range(df)
3 end_date = input("Enter end date dd/mm/yyyy: ")
4
----> 5 df = df[(df['OffHire'] <= end_date) &
6 ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
7
TypeError: '<=' not supported between instances of 'Timestamp' and 'str'
I could probably fix the error, however the implementation of how to apply the function has me stuck!
Any help would be greatly appreciated and would be another lesson for me!
Thanks in advance
CodePudding user response:
IIUC, you want something like this:
#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])
#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")
#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]
CodePudding user response:
I think the best way is to use HireStart
as index and exploit pandas slicing for datetime index. Something like:
df.set_index('HireStart')['2021-01-02':'2021-01-08']