In a CSV file, I have two date formats: 05/02/1980
and 2022-07-30T10:00:20.973235
I'd like to convert them to yyyy/mm/dd
dates types.
I've tried to_datetime
and parse_dates=
to no avail.
Any ideas?
import pandas as pd
from dateutil.relativedelta import *
from datetime import *
Retrieve data from the vaccination CSV
df = pd.read_csv("metabase_data/vaccinations/query.csv", usecols=[2,3,4,5,8,9,10])
df = df.rename(columns={'First Name': 'firstName', 'Last Name': 'lastName', 'Dob': 'dob', 'Reservation Number': 'reservation', 'Sec On D/ Last Vaccination Date': 'vaccinationDate', 'Vaccination Proof Uploaded Date Time' : 'dateTimeUploaded', 'Club Name':'voyageId'})
CSV extract
ID | Profile ID | First Name | Last Name | Dob | Reservation Number | Type | Fir St/ Last Vaccination Date | Sec On D/ Last Vaccination Date | Vaccination Proof Uploaded Date Time | Club Name |
---|---|---|---|---|---|---|---|---|---|---|
dfgfgh | ccbvcf | JASON | CHAMBERS | 05/02/1980 | 45544 | P | 23/04/2012 | 2022-07-30T10:00:20.973235 | rt1377 | |
dfgnhvn | hnghn | HEATHERE | JACOBSON | 28/05/1983 | 34646 | P | 07/01/2001 | 2022-07-30T09:57:45.614589 | rt1377 | |
vhnghgh | vnvn | JASMINE | JACOBSON | 28/08/2004 | 45656 | P | 30/04/2011 | 2022-07-30T09:56:29.322404 | rt1377 |
Thanks in advance.
CodePudding user response:
From your Dataframe :
ID Profile ID First Name Last Name Dob Reservation Number Type Fir St/ Last Vaccination Date Sec On D/ Last Vaccination Date Vaccination Proof Uploaded Date Time Club Name
0 dfgfgh ccbvcf JASON CHAMBERS 05/02/1980 45544 P NaN 23/04/2012 2022-07-30T10:00:20.973235 rt1377
1 dfgnhvn hnghn HEATHERE JACOBSON 28/05/1983 34646 P NaN 07/01/2001 2022-07-30T09:57:45.614589 rt1377
2 vhnghgh vnvn JASMINE JACOBSON 28/08/2004 45656 P NaN 30/04/2011 2022-07-30T09:56:29.322404 rt1377
We can use pd.to_datetime
to convert the date to Datetime with two differents formats to get the expected result :
df['Sec On D/ Last Vaccination Date'] = pd.to_datetime(df['Sec On D/ Last Vaccination Date'], format="%d/%m/%Y")
df['Vaccination Proof Uploaded Date Time'] = pd.to_datetime(df['Vaccination Proof Uploaded Date Time'], format="%Y-%m-%dT%H:%M:%S.%f")
Output :
ID Profile ID First Name Last Name Dob Reservation Number Type Fir St/ Last Vaccination Date Sec On D/ Last Vaccination Date Vaccination Proof Uploaded Date Time Club Name
0 dfgfgh ccbvcf JASON CHAMBERS 05/02/1980 45544 P NaN 2012-04-23 2022-07-30T10:00:20.973235 rt1377
1 dfgnhvn hnghn HEATHERE JACOBSON 28/05/1983 34646 P NaN 2001-01-07 2022-07-30T09:57:45.614589 rt1377
2 vhnghgh vnvn JASMINE JACOBSON 28/08/2004 45656 P NaN 2011-04-30 2022-07-30T09:56:29.322404 rt1377
To shorten the datetime with hours to the date only, we can use dt.date
like so :
df['Vaccination Proof Uploaded Date Time'] = pd.to_datetime(df['Vaccination Proof Uploaded Date Time'], format="%Y-%m-%dT%H:%M:%S.%f").dt.date
Output :
ID Profile ID First Name Last Name Dob Reservation Number Type Fir St/ Last Vaccination Date Sec On D/ Last Vaccination Date Vaccination Proof Uploaded Date Time Club Name
0 dfgfgh ccbvcf JASON CHAMBERS 05/02/1980 45544 P NaN 2012-04-23 2022-07-30 rt1377
1 dfgnhvn hnghn HEATHERE JACOBSON 28/05/1983 34646 P NaN 2001-01-07 2022-07-30 rt1377
2 vhnghgh vnvn JASMINE JACOBSON 28/08/2004 45656 P NaN 2011-04-30 2022-07-30 rt1377
CodePudding user response:
pd.to_datetime
can infer datetime type, you can use
date1 = pd.to_datetime('05/02/1980', dayfirst=True)
date2 = pd.to_datetime('2022-07-30T10:00:20.973235')
print(date1)
1980-02-05 00:00:00
print(date2)
2022-07-30 10:00:20.973235