Home > Mobile >  Filtering DataFrame by dates
Filtering DataFrame by dates

Time:06-23

I know this seems like a bit of a redundant question, but I'm having trouble filtering my dataframe by dates.

Context: I have a dataframe of all orders between 2020 and 2022. From this dataframe, I need to filter out dates when the cars were not "branded". Filtering by my method produces an empty dataframe, and I think the issue arises from the dates being the wrong format. Basically, I think I'm having trouble converting 'pandas._libs.tslibs.timestamps.Timestamp' into 'datetime.datetime'.

My code so far:

cars = ['AAA111', 'BBB222', 'CCC333']
branded = ['2020-11-19', '2019-04-30', '2019-09-10']
debranded = ['2022-06-21', '2020-03-01', '2020-03-01']

orders = pd.read_csv('filtered_orders.csv', index_col=0)
orders = orders.rename({'Created Time': 'time'}, axis='columns')

branded_time = [datetime.strptime(x, "%Y-%m-%d") for x in branded]
debranded_time = [datetime.strptime(x, "%Y-%m-%d") for x in debranded]
time_list = orders['time'].tolist()

time_list_formated =[]
for x in time_list:
    time_list_formated.append(datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

orders['times'] = time_list_formated

filtered_orders = pd.DataFrame()
for x in cars:
    for y in branded_time:
        for z in debranded_time:
            filtered_orders.append(orders[(orders['Car'] == x) & (orders['times'] >= y) & (orders['times'] <= z)])

branded_time and debranded_time variables are in datetime.datetime format, and orders['times'] is pandas._libs.tslibs.timestamps.Timestamp.

I would greatly appreciate any help you could give me on this.

Edit:

My dataset looks a bit like so:

Created Time        Ride price  Car
2022-06-20 16:09:53             AAA111
2019-01-03 15:37:20     x1      BBB222
2019-06-30 16:09:51             BBB222
2021-06-21 15:37:10     x2      BBB222
2022-05-03 16:09:52             BBB222
2019-01-03 15:37:20     x3      CCC333
2019-06-30 16:09:51     x4      CCC333
2021-06-21 15:37:10     x5      CCC333
2022-05-03 16:09:52     x6      CCC333

After running the script, I would like to have something that looks like this:

Created Time        Ride price  Car
2022-06-20 16:09:53             AAA111
2019-06-30 16:09:51             BBB222

CodePudding user response:

I think this will get you what you need:

import pandas as pd
import numpy as np

cars = ['AAA111', 'BBB222', 'CCC333']
branded = ['2020-11-19', '2019-04-30', '2019-09-10']
debranded = ['2022-06-21', '2020-03-01', '2020-03-01']
df_cars = pd.DataFrame({
    'Cars' : cars,
    'Branded' : branded,
    'Debranded' : debranded
})
df_merge = pd.merge(df1, df_cars, how = 'left', left_on = 'Car', right_on = 'Cars')
df_merge['Created Time'] = pd.to_datetime(df_merge['Created Time'], infer_datetime_format=True)
df_merge['Branded'] = pd.to_datetime(df_merge['Branded'], infer_datetime_format=True)
df_merge['Debranded'] = pd.to_datetime(df_merge['Debranded'], infer_datetime_format=True)
df_merge = df_merge.loc[np.where(df_merge['Created Time'].between(df_merge['Branded'], df_merge['Debranded']), True, False)]
df_merge
  • Related