If I have a dataframe that had id and date and would like to filter based on id and date how can I do it if I have many dates and ids to filter?
df = pd.DataFrame([
{'id': 'thing 1', 'date': '2016-01-01', 'quantity': 1 },
{'id': 'thing 1', 'date': '2016-02-01', 'quantity': 1 },
{'id': 'thing 1', 'date': '2016-09-01', 'quantity': 1 },
{'id': 'thing 1', 'date': '2016-10-01', 'quantity': 1 },
{'id': 'thing 2', 'date': '2017-01-01', 'quantity': 2 },
{'id': 'thing 2', 'date': '2017-02-01', 'quantity': 2 },
{'id': 'thing 2', 'date': '2017-02-11', 'quantity': 2 },
{'id': 'thing 2', 'date': '2017-09-01', 'quantity': 2 },
{'id': 'thing 2', 'date': '2017-10-01', 'quantity': 2 },
])
df.date = pd.to_datetime(df.date, format="%Y-%m-%d")
date_dict = {'thing1':'2016-02-01',
'thing2': '2017-09-01'}
If I have just 2 I could just hardcode it like this :
df.loc[((df['id']=='thing 1') & (df['date']<='2016-02-01')) | ((df['id']=='thing 2') & (df['date']<='2017-09-01'))]
However if I have 1000s of different ID and 1000s date how can I do it efficiently?
Thanks you, Sam
CodePudding user response:
You can create a Series
from the dictionary, merge
to df and query
where the date is less than the date in your dictionary.
res = (
df.merge(pd.Series(date_dict, name='dt_max'),
left_on='id', right_index=True, how='left')
.query('date<=dt_max')[df.columns]
)
print(res)
# id date quantity
# 0 thing 1 2016-01-01 1
# 1 thing 1 2016-02-01 1
# 4 thing 2 2017-01-01 2
# 5 thing 2 2017-02-01 2
# 6 thing 2 2017-02-11 2
# 7 thing 2 2017-09-01 2
Note, make sure your dictionary keys are the same than the id (you currently have typo in it)