Home > Software design >  How to filter dataframe by dictionary that contains date?
How to filter dataframe by dictionary that contains date?

Time:08-18

If I have a dataframe that had id and date and would like to filter based on id and date how can I do it if I have many dates and ids to filter?

df = pd.DataFrame([
  {'id': 'thing 1', 'date': '2016-01-01', 'quantity': 1 },
  {'id': 'thing 1', 'date': '2016-02-01', 'quantity': 1 },
  {'id': 'thing 1', 'date': '2016-09-01', 'quantity': 1 },
  {'id': 'thing 1', 'date': '2016-10-01', 'quantity': 1 },
  {'id': 'thing 2', 'date': '2017-01-01', 'quantity': 2 },
  {'id': 'thing 2', 'date': '2017-02-01', 'quantity': 2 },
  {'id': 'thing 2', 'date': '2017-02-11', 'quantity': 2 },
  {'id': 'thing 2', 'date': '2017-09-01', 'quantity': 2 },
  {'id': 'thing 2', 'date': '2017-10-01', 'quantity': 2 },
])
df.date = pd.to_datetime(df.date, format="%Y-%m-%d")
date_dict = {'thing1':'2016-02-01',
             'thing2': '2017-09-01'}

If I have just 2 I could just hardcode it like this :

df.loc[((df['id']=='thing 1') & (df['date']<='2016-02-01')) | ((df['id']=='thing 2') & (df['date']<='2017-09-01'))]

However if I have 1000s of different ID and 1000s date how can I do it efficiently?

Thanks you, Sam

CodePudding user response:

You can create a Series from the dictionary, merge to df and query where the date is less than the date in your dictionary.

res = (
    df.merge(pd.Series(date_dict, name='dt_max'), 
             left_on='id', right_index=True, how='left')
      .query('date<=dt_max')[df.columns]
)
print(res)
#         id       date  quantity
# 0  thing 1 2016-01-01         1
# 1  thing 1 2016-02-01         1
# 4  thing 2 2017-01-01         2
# 5  thing 2 2017-02-01         2
# 6  thing 2 2017-02-11         2
# 7  thing 2 2017-09-01         2

Note, make sure your dictionary keys are the same than the id (you currently have typo in it)

  • Related