I'm trying to get the index of my dataFrame to be of type datetime. My CSV file contains seperate columns of Dates and Times which i combine upon importing:
df = pd.read_csv("example.csv", sep=";", decimal=",", parse_dates=[["Date", "Time"]])
It will look like this after the import:
Date_Time | |
---|---|
0 | 1012020 00:00:00 |
1 | 1012020 00:15:00 |
The problem is the missing leading zero on the first 9 days of each month. Pandas to_datetime()
needs a leading zero for the %d
format option to work. When i use format="%d%m%Y%H:%M:%S"
python says "invalid syntax"
How can I convert this column to datetime?
CodePudding user response:
Use Series.str.zfill
(as suggested by @FObersteiner in the comments) and apply pd.to_datetime
afterwards:
import pandas as pd
# changing 2nd val to `'12012020 00:15:00'` to show that
# only the 1st val is affected
data = {'Date_Time': {0: '1012020 00:00:00', 1: '12012020 00:15:00'}}
df = pd.DataFrame(data)
df['Date_Time'] = pd.to_datetime(df["Date_Time"].str.zfill(17),
format="%d%m%Y %H:%M:%S")
print(df)
Date_Time
0 2020-01-01 00:00:00
1 2020-01-12 00:15:00
print(df['Date_Time'].dtype)
datetime64[ns]
Another (admittedly, unnecessarily complicated) way to go, would be to use a regex pattern to replace all "dates" with 7 digits by their 8-digit equivalent:
df['Date_Time'] = pd.to_datetime(
df['Date_Time'].replace(r'^(\d{7}\s)',r'0\1', regex=True),
format="%d%m%Y %H:%M:%S")
Explanation r'^(\d{7}\s)'
:
^
assert position at start of the string\d{7}\s
matches 7 digits followed by a whitespace- The encapsulating brackets turn this into a
Capturing Group
Explanation r'0\1'
:
\1
refers back to theCapturing Group
(1st of 1 group(s)), to which we prepend0