I have a dataframe in pandas that originally had a parsed date column with values such as: 2020-05-11 02:23:00
.
From that I created a 'Date' column using df['Date'] = df.parsed_date_column.dt.date
producing values eg 2020-05-11
with datatype 'object'.
From there I wanted to only retain the rows from a certain date, i.e. df.loc[df['Date'] == '2021-06-26']
or more simply df[df['Date'] == '2021-06-26']
.
Whenever I execute this, it returns an empty dataframe with no rows and only the column names.
I have tried converting it to datetime64:pd.to_datetime(df['Date'], format='%Y-%m-%d')
,
specify the object as a string: combined_lanes[~combined_lanes['Date'].str.contains("2021-06-20")]
,
and defining the date seperately:
date = "2021-06-20"
df = df[df['Date'].str.lower() == date]
Nothing I've tried or researched works.
Note: this does not occur when I try other columns, including ones with the datatype 'object'.
CodePudding user response:
Let's take a sample dataframe
import pandas as pd
import numpy as np
from datetime import datetime
rand = np.random.RandomState(0)
df = pd.DataFrame({
'date': pd.date_range('2022-01-01 13:13:05', freq='d', periods=10),
'var': rand.randn(10),
})
df.head()
date var
0 2022-01-01 13:13:05 1.764052
1 2022-01-02 13:13:05 0.400157
2 2022-01-03 13:13:05 0.978738
3 2022-01-04 13:13:05 2.240893
4 2022-01-05 13:13:05 1.867558
To retain only the year month and day information, you can do this
df['date'] = df['date'].dt.date
df.head()
date var
0 2022-01-01 1.764052
1 2022-01-02 0.400157
2 2022-01-03 0.978738
3 2022-01-04 2.240893
4 2022-01-05 1.867558
Whenever you have some issue doing a conditional operation, it's a good idea to check the data type of the elements you are comparing. It may look the same to human eyes, but the code may interpret it differently. So,
print(type(df.loc[0,'date']))
<class 'datetime.date'>
Similarly,
flt_date = datetime.strptime('2022-01-08', "%Y-%m-%d").date()
print(type(flt_date))
<class 'datetime.date'>
Now that the data types match, you can do the comparison
df.loc[df['date']==flt_date]
Returns
date var
7 2022-01-08 -0.151357
CodePudding user response:
You have a datetime.date
dtype in you column, so you must compare to the same type.
You can use Timestamp.date
to generate it:
df.loc[df['date'] == pd.Timestamp('2021-06-20').date()]