I have a DataFrame that looks like this:
Date | Parameter |
---|---|
2010-01-02 | some value |
2010-01-03 | some value |
2010-01-04 | some value |
... | ... |
2011-01-02 | some value |
2011-01-03 | some value |
2011-01-04 | some value |
... | ... |
2012-01-02 | some value |
2012-01-03 | some value |
2012-01-04 | some value |
Date column (%Y-%m-%d) is Dtype datetime64[ns] and Parameter column is Dtype float64
I also have a string variable 'limit_date'
I want to get a DF that has just the rows corresponding to the dates before limit_date.
I have used this two different approaches and they worked: df[df['Date']<'2011-01-01'] df.query('Date<20110101')
I get something that looks like this in both cases:
Date | Parameter |
---|---|
2010-01-02 | some value |
2010-01-03 | some value |
2010-01-04 | some value |
... | ... |
2010-12-30 | some value |
2010-12-31 | some value |
However. If I want to use the string limit_date I run into problems:
When I use limit_date = '2011-01-01' AND df[df['Date']<limit_date] I get an empty DF
When I use limit_date = '20110101' AND df.query('Date<limit_date') I get the error :
**UndefinedVariableError: name 'limit_date' is not defined**
Does anyone have any idea how to deal with this issue?
I need to use a string variable bc in my code this limit_date will change throughout the code.
CodePudding user response:
DataFrame.query
expects you to reference column names; df.query('Date < limit_date')
is trying to query rows where the value in the 'Date'
column is less than the value in the 'limit_date'
column. (How would it know 'Date'
is a column but limit_date
is a variable, given that you refer to them similarly?)
To distinguish between columns and an environment variable you will need to add @
before the environment variables.
df.query('Date < @limit_date')
# Date
#0 2010-01-01
#1 2010-01-02
#2 2010-01-03
#3 2010-01-04
Your first attempt: df[df['Date'] < limit_date]
is technically correct and should work. Likely, you applied a previous filter to the DataFrame and overwrote the result, so that now when you apply this filter it results in an empty DataFrame. Refresh the kernel and start from the beginning.
Sample Data
import pandas as pd
df = pd.DataFrame({'Date': pd.date_range('2010-01-01', freq='D', periods=20)})
limit_date = '2010-01-05'