Home > OS >  Filter data in pandas by a string date
Filter data in pandas by a string date

Time:05-19

I have a DataFrame that looks like this:

Date Parameter
2010-01-02 some value
2010-01-03 some value
2010-01-04 some value
... ...
2011-01-02 some value
2011-01-03 some value
2011-01-04 some value
... ...
2012-01-02 some value
2012-01-03 some value
2012-01-04 some value

Date column (%Y-%m-%d) is Dtype datetime64[ns] and Parameter column is Dtype float64

I also have a string variable 'limit_date'

I want to get a DF that has just the rows corresponding to the dates before limit_date.

I have used this two different approaches and they worked: df[df['Date']<'2011-01-01'] df.query('Date<20110101')

I get something that looks like this in both cases:

Date Parameter
2010-01-02 some value
2010-01-03 some value
2010-01-04 some value
... ...
2010-12-30 some value
2010-12-31 some value

However. If I want to use the string limit_date I run into problems:

  • When I use limit_date = '2011-01-01' AND df[df['Date']<limit_date] I get an empty DF

  • When I use limit_date = '20110101' AND df.query('Date<limit_date') I get the error :

         **UndefinedVariableError: name 'limit_date' is not defined**
    

Does anyone have any idea how to deal with this issue?

I need to use a string variable bc in my code this limit_date will change throughout the code.

CodePudding user response:

DataFrame.query expects you to reference column names; df.query('Date < limit_date') is trying to query rows where the value in the 'Date' column is less than the value in the 'limit_date' column. (How would it know 'Date' is a column but limit_date is a variable, given that you refer to them similarly?)

To distinguish between columns and an environment variable you will need to add @ before the environment variables.

df.query('Date < @limit_date')
#        Date
#0 2010-01-01
#1 2010-01-02
#2 2010-01-03
#3 2010-01-04

Your first attempt: df[df['Date'] < limit_date] is technically correct and should work. Likely, you applied a previous filter to the DataFrame and overwrote the result, so that now when you apply this filter it results in an empty DataFrame. Refresh the kernel and start from the beginning.


Sample Data

import pandas as pd

df = pd.DataFrame({'Date': pd.date_range('2010-01-01', freq='D', periods=20)})
limit_date = '2010-01-05'
  • Related