Home > Enterprise >  .loc will not work specifically on a 'Date' column pandas
.loc will not work specifically on a 'Date' column pandas

Time:10-22

I have a dataframe in pandas that originally had a parsed date column with values such as: 2020-05-11 02:23:00.

From that I created a 'Date' column using df['Date'] = df.parsed_date_column.dt.date producing values eg 2020-05-11 with datatype 'object'.

From there I wanted to only retain the rows from a certain date, i.e. df.loc[df['Date'] == '2021-06-26'] or more simply df[df['Date'] == '2021-06-26'].
Whenever I execute this, it returns an empty dataframe with no rows and only the column names.

I have tried converting it to datetime64:pd.to_datetime(df['Date'], format='%Y-%m-%d'),
specify the object as a string: combined_lanes[~combined_lanes['Date'].str.contains("2021-06-20")],
and defining the date seperately:

date = "2021-06-20"
df = df[df['Date'].str.lower() == date]

Nothing I've tried or researched works.

Note: this does not occur when I try other columns, including ones with the datatype 'object'.

CodePudding user response:

Let's take a sample dataframe

import pandas as pd
import numpy as np
from datetime import datetime

rand = np.random.RandomState(0)
df = pd.DataFrame({
    'date': pd.date_range('2022-01-01 13:13:05', freq='d', periods=10),
    'var': rand.randn(10),
})
df.head()
                 date       var
0 2022-01-01 13:13:05  1.764052
1 2022-01-02 13:13:05  0.400157
2 2022-01-03 13:13:05  0.978738
3 2022-01-04 13:13:05  2.240893
4 2022-01-05 13:13:05  1.867558

To retain only the year month and day information, you can do this

df['date'] = df['date'].dt.date
df.head()
         date       var
0  2022-01-01  1.764052
1  2022-01-02  0.400157
2  2022-01-03  0.978738
3  2022-01-04  2.240893
4  2022-01-05  1.867558

Whenever you have some issue doing a conditional operation, it's a good idea to check the data type of the elements you are comparing. It may look the same to human eyes, but the code may interpret it differently. So,

print(type(df.loc[0,'date']))
<class 'datetime.date'>

Similarly,

flt_date = datetime.strptime('2022-01-08', "%Y-%m-%d").date()
print(type(flt_date))
<class 'datetime.date'>

Now that the data types match, you can do the comparison

df.loc[df['date']==flt_date]

Returns

         date       var
7  2022-01-08 -0.151357

CodePudding user response:

You have a datetime.date dtype in you column, so you must compare to the same type.

You can use Timestamp.date to generate it:

df.loc[df['date'] == pd.Timestamp('2021-06-20').date()]
  • Related