Home > Software design >  Pandas converting one set of columns and other to timestamp on read csv
Pandas converting one set of columns and other to timestamp on read csv

Time:11-28

I am currently trying to read a CSV with two columns that have date variables in them. The problem that I am seeing is that column DateA values are coming back as a datetime type, while DateB column values are a Pandas Timestamp type. Any ideas why this could be? I cant convert from Timestamp to date time individually either. I customized the read_csv to be as follows:

pd.read_csv("filename.csv", parse_dates=['DateA', 'DateB'], date_parser=self.date_function)

def date_function(self, date_list):
    for i in range(len(date_list)):
       #Date formats can be 1/10/21 or 1/10/21 12:45
        formats = ['%m/%d/%y %H:%M', '%Y-%m-%d %H:%M:%S', '%m/%d/%y']
                        for format in formats:
                            try:
                                date_list[i] = dt.strptime(date_list[i], format)
                                break
                            except Exception as e:
                                print("Error parsing: {}. lets keep trying other formats!".format(e))
                        continue

    return date_list

CSV: DateA,Summary,DateB

10/21/21 19:00,Some Summary,10/26/21

Padas version 1.3.4

CodePudding user response:

I don't understand where is your problem. I slightly modified your code because it's not reproducible:

import pandas as pd
from datetime import datetime as dt

def date_function(date_list):
    for i in range(len(date_list)):
       #Date formats can be 1/10/21 or 1/10/21 12:45
       formats = ['%m/%d/%y %H:%M', '%Y-%m-%d %H:%M:%S', '%m/%d/%y']
       for format in formats:
           try:
               date_list[i] = dt.strptime(date_list[i], format)
               break
           except Exception as e:
               print("Error parsing: {}. lets keep trying other formats!".format(e))
               continue
    return date_list

df = pd.read_csv('filename.csv', parse_dates=['DateA', 'DateB'], date_parser=date_function)

# Output:
Error parsing: time data '10/26/21' does not match format '%m/%d/%y %H:%M'. lets keep trying other formats!
Error parsing: time data '10/26/21' does not match format '%Y-%m-%d %H:%M:%S'. lets keep trying other formats!

Result:

>>> df.dtypes
DateA      datetime64[ns]
Summary            object
DateB      datetime64[ns]
dtype: object

>>> df.loc[0, 'DateA']
Timestamp('2021-10-21 19:00:00')

>>> df.loc[0, 'DateB']
Timestamp('2021-10-26 00:00:00')

But you can get the same result without your date_parser function:

df = pd.read_csv('filename.csv', parse_dates=['DateA', 'DateB'])
  • Related