I am currently trying to read a CSV with two columns that have date variables in them. The problem that I am seeing is that column DateA values are coming back as a datetime type, while DateB column values are a Pandas Timestamp type. Any ideas why this could be? I cant convert from Timestamp to date time individually either. I customized the read_csv to be as follows:
pd.read_csv("filename.csv", parse_dates=['DateA', 'DateB'], date_parser=self.date_function)
def date_function(self, date_list):
for i in range(len(date_list)):
#Date formats can be 1/10/21 or 1/10/21 12:45
formats = ['%m/%d/%y %H:%M', '%Y-%m-%d %H:%M:%S', '%m/%d/%y']
for format in formats:
try:
date_list[i] = dt.strptime(date_list[i], format)
break
except Exception as e:
print("Error parsing: {}. lets keep trying other formats!".format(e))
continue
return date_list
CSV: DateA,Summary,DateB
10/21/21 19:00,Some Summary,10/26/21
Padas version 1.3.4
CodePudding user response:
I don't understand where is your problem. I slightly modified your code because it's not reproducible:
import pandas as pd
from datetime import datetime as dt
def date_function(date_list):
for i in range(len(date_list)):
#Date formats can be 1/10/21 or 1/10/21 12:45
formats = ['%m/%d/%y %H:%M', '%Y-%m-%d %H:%M:%S', '%m/%d/%y']
for format in formats:
try:
date_list[i] = dt.strptime(date_list[i], format)
break
except Exception as e:
print("Error parsing: {}. lets keep trying other formats!".format(e))
continue
return date_list
df = pd.read_csv('filename.csv', parse_dates=['DateA', 'DateB'], date_parser=date_function)
# Output:
Error parsing: time data '10/26/21' does not match format '%m/%d/%y %H:%M'. lets keep trying other formats!
Error parsing: time data '10/26/21' does not match format '%Y-%m-%d %H:%M:%S'. lets keep trying other formats!
Result:
>>> df.dtypes
DateA datetime64[ns]
Summary object
DateB datetime64[ns]
dtype: object
>>> df.loc[0, 'DateA']
Timestamp('2021-10-21 19:00:00')
>>> df.loc[0, 'DateB']
Timestamp('2021-10-26 00:00:00')
But you can get the same result without your date_parser
function:
df = pd.read_csv('filename.csv', parse_dates=['DateA', 'DateB'])