I know this seems like a bit of a redundant question, but I'm having trouble filtering my dataframe by dates.
Context: I have a dataframe of all orders between 2020 and 2022. From this dataframe, I need to filter out dates when the cars were not "branded". Filtering by my method produces an empty dataframe, and I think the issue arises from the dates being the wrong format. Basically, I think I'm having trouble converting 'pandas._libs.tslibs.timestamps.Timestamp' into 'datetime.datetime'.
My code so far:
cars = ['AAA111', 'BBB222', 'CCC333']
branded = ['2020-11-19', '2019-04-30', '2019-09-10']
debranded = ['2022-06-21', '2020-03-01', '2020-03-01']
orders = pd.read_csv('filtered_orders.csv', index_col=0)
orders = orders.rename({'Created Time': 'time'}, axis='columns')
branded_time = [datetime.strptime(x, "%Y-%m-%d") for x in branded]
debranded_time = [datetime.strptime(x, "%Y-%m-%d") for x in debranded]
time_list = orders['time'].tolist()
time_list_formated =[]
for x in time_list:
time_list_formated.append(datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
orders['times'] = time_list_formated
filtered_orders = pd.DataFrame()
for x in cars:
for y in branded_time:
for z in debranded_time:
filtered_orders.append(orders[(orders['Car'] == x) & (orders['times'] >= y) & (orders['times'] <= z)])
branded_time and debranded_time variables are in datetime.datetime format, and orders['times'] is pandas._libs.tslibs.timestamps.Timestamp.
I would greatly appreciate any help you could give me on this.
Edit:
My dataset looks a bit like so:
Created Time Ride price Car
2022-06-20 16:09:53 AAA111
2019-01-03 15:37:20 x1 BBB222
2019-06-30 16:09:51 BBB222
2021-06-21 15:37:10 x2 BBB222
2022-05-03 16:09:52 BBB222
2019-01-03 15:37:20 x3 CCC333
2019-06-30 16:09:51 x4 CCC333
2021-06-21 15:37:10 x5 CCC333
2022-05-03 16:09:52 x6 CCC333
After running the script, I would like to have something that looks like this:
Created Time Ride price Car
2022-06-20 16:09:53 AAA111
2019-06-30 16:09:51 BBB222
CodePudding user response:
I think this will get you what you need:
import pandas as pd
import numpy as np
cars = ['AAA111', 'BBB222', 'CCC333']
branded = ['2020-11-19', '2019-04-30', '2019-09-10']
debranded = ['2022-06-21', '2020-03-01', '2020-03-01']
df_cars = pd.DataFrame({
'Cars' : cars,
'Branded' : branded,
'Debranded' : debranded
})
df_merge = pd.merge(df1, df_cars, how = 'left', left_on = 'Car', right_on = 'Cars')
df_merge['Created Time'] = pd.to_datetime(df_merge['Created Time'], infer_datetime_format=True)
df_merge['Branded'] = pd.to_datetime(df_merge['Branded'], infer_datetime_format=True)
df_merge['Debranded'] = pd.to_datetime(df_merge['Debranded'], infer_datetime_format=True)
df_merge = df_merge.loc[np.where(df_merge['Created Time'].between(df_merge['Branded'], df_merge['Debranded']), True, False)]
df_merge