Home > database >  Parsing date in pandas.read_csv
Parsing date in pandas.read_csv

Time:01-04

I am trying to read a CSV file which has in its first column date values specified in this format:

"Dec 30, 2021","1.1","1.2","1.3","1"

While I can define the types for the remaining columns using dtype= clause, I do not know how to handle the Date.

I have tried the obvious np.datetime64 without success.

Is there any way to specify a format to parse this date directly using read_csv method?

CodePudding user response:

Update

What if I want to further specify the format for a,b,c and d? I used a simplified example, in my file numbers are formated like this "2,345.55" and those are read as object by read_csv, not as float64 or int64 as in your example

converters = {
    'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
    'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)

Output:

>>> df
        Date   Number
0 2021-12-30  2345.55

>>> df.dtypes
Date      datetime64[ns]
Number           float64
dtype: object

# data.csv
Date,Number
"Dec 30, 2021","2,345.55"

Old answer

If you have a particular format, you can pass a custom function to date_parser parameter:

from datetime import datetime

custom_date_parser = lambda x: datetime.strptime(x, "%b %d, %Y")
df = pd.read_csv('data.csv', parse_dates=['Date'], date_parser=custom_date_parser)
print(df)

# Output
        Date    A    B    C  D
0 2021-12-30  1.1  1.2  1.3  1

Or let Pandas try to determine the format as suggested by @richardec.

CodePudding user response:

Just specify a list of columns that should be convert to dates in the parse_dates= of pd.read_csv:

>>> df = pd.read_csv('file.csv', parse_dates=['date'])
>>> df
        date    a    b    c  d
0 2021-12-30  1.1  1.2  1.3  1

>>> df.dtypes
date    datetime64[ns]
a              float64
b              float64
c              float64
d                int64

CodePudding user response:

You may use parse_dates :

df = pd.read_csv('data.csv', parse_dates=['date'])

But in my experience it is a frequent source of errors, I think it is better to specify the date format and convert manually the date column. For example, in your case :

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format = '%b %d, %Y')
  • Related