Home > other >  Pandas filter df by date range and condition
Pandas filter df by date range and condition

Time:12-22

I have a dataframe that contains 3 datetime columns

              ItemUid   HireStart    DCompleteDate       OffHire
              14055     2021-01-01       2021-12-17      2021-01-09
              14065     2021-08-12       2021-12-17      2021-11-17
              14534     2018-12-21             NaT             NaT
              11639           NaT              NaT             NaT
              43268     2020-09-07       2020-09-03      2020-11-03
              36723     2021-01-03             Nat       2021-01-10
             

I am trying to return a dataframe that returns the items that were on hire between a user inputted date range.

Ie: if the user inputs: start date = '2021-01-02' & end date = '2021-01-08' the expected result would be:

          ItemUid   HireStart    DCompleteDate       OffHire
          14055     2021-01-01       2021-01-23      2021-01-09
          14534     2018-12-21             NaT             NaT
          36723     2021-01-03             Nat       2021-01-10
             

My code :)

def date_range(df):
    start_date = input("Enter start date dd/mm/yyyy: ")
    end_date = input("Enter end date dd/mm/yyyy: ")

    df = df[(df['OffHire'] <= end_date) & 
             ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
    
    return df

result = df_hire.apply(date_range, axis=1)

This is currently getting an error:

TypeError                                 Traceback (most recent call last)
<ipython-input-60-6d4d17020cba> in <module>()
      9     return df
     10 
---> 11 result = df_hire.apply(date_range, axis=1)

4 frames
<ipython-input-60-6d4d17020cba> in date_range(df)
      3     end_date = input("Enter end date dd/mm/yyyy: ")
      4 
----> 5     df = df[(df['OffHire'] <= end_date) & 
      6              ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
      7 

TypeError: '<=' not supported between instances of 'Timestamp' and 'str'

I could probably fix the error, however the implementation of how to apply the function has me stuck!

Any help would be greatly appreciated and would be another lesson for me!

Thanks in advance

CodePudding user response:

IIUC, you want something like this:

#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])

#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")

#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]

CodePudding user response:

I think the best way is to use HireStart as index and exploit pandas slicing for datetime index. Something like:

df.set_index('HireStart')['2021-01-02':'2021-01-08']
  • Related