Home > Mobile >  Convert to_datetime when days don't contain leading zero
Convert to_datetime when days don't contain leading zero

Time:09-13

I'm trying to get the index of my dataFrame to be of type datetime. My CSV file contains seperate columns of Dates and Times which i combine upon importing:

df = pd.read_csv("example.csv", sep=";", decimal=",", parse_dates=[["Date", "Time"]])

It will look like this after the import:

Date_Time
0 1012020 00:00:00
1 1012020 00:15:00

The problem is the missing leading zero on the first 9 days of each month. Pandas to_datetime() needs a leading zero for the %d format option to work. When i use format="%d%m%Y%H:%M:%S" python says "invalid syntax"

How can I convert this column to datetime?

CodePudding user response:

Use Series.str.zfill (as suggested by @FObersteiner in the comments) and apply pd.to_datetime afterwards:

import pandas as pd

# changing 2nd val to `'12012020 00:15:00'` to show that 
# only the 1st val is affected
data = {'Date_Time': {0: '1012020 00:00:00', 1: '12012020 00:15:00'}}

df = pd.DataFrame(data)

df['Date_Time'] = pd.to_datetime(df["Date_Time"].str.zfill(17), 
                                 format="%d%m%Y %H:%M:%S")

print(df)

            Date_Time
0 2020-01-01 00:00:00
1 2020-01-12 00:15:00

print(df['Date_Time'].dtype)
datetime64[ns]

Another (admittedly, unnecessarily complicated) way to go, would be to use a regex pattern to replace all "dates" with 7 digits by their 8-digit equivalent:

df['Date_Time'] = pd.to_datetime(
    df['Date_Time'].replace(r'^(\d{7}\s)',r'0\1', regex=True),
    format="%d%m%Y %H:%M:%S")

Explanation r'^(\d{7}\s)':

  • ^ assert position at start of the string
  • \d{7}\s matches 7 digits followed by a whitespace
  • The encapsulating brackets turn this into a Capturing Group

Explanation r'0\1':

  • \1 refers back to the Capturing Group (1st of 1 group(s)), to which we prepend 0
  • Related