My CSV data looks like this -
Date Time
1/12/2019 12:04AM
1/12/2019 12:09AM
1/12/2019 12:14AM
and so on
And I am trying to read this file using pandas in the following way -
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv',parse_dates=[['Date','Time']])
print(data['Date_Time'].dt.month)
When I try to access the year through the dt accessor the year prints out fine as 2019. But when I try to print the day or the month it is completely incorrect. In the case of month it starts off as 1 and ends up as 12 when the right value should be 12 all the time.
With the day it starts off as 12 and ends up at 31 when it should start at 1 and end in 31. The file has total of 8867 entries. Where am I going wrong ?
CodePudding user response:
The default format is MM/DD, while yours is DD/MM.
The simplest solution is to set the dayfirst
parameter of read_csv
:
dayfirst
: DD/MM format dates, international and European format (defaultFalse
)
data = pd.read_csv('D 2019.csv', parse_dates=[['Date', 'Time']], dayfirst=True)
# -------------
>>> data['Date_Time'].dt.month
# 0 12
# 1 12
# 2 12
# Name: Date_Time, dtype: int64
CodePudding user response:
Try assigning format
argument of pd.to_datetime
df = pd.read_csv('D 2019.csv')
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format='%d/%m/%Y %H:%M%p')
CodePudding user response:
You need to check the data type of your dataframe and convert the column "Date" into datetime
df["Date"] = pd.to_datetime(df["Date"])
After you can access the day, month, or year using:
dt.day
dt.month
dt.year
Note: Make sure the format of the date (D/M/Y or M/D/Y)
Full Code
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv')
data["Date"] = pd.to_datetime(data["Date"])
print(data["Date"].dt.day)
print(data["Date"].dt.month)
print(data["Date"].dt.year)