Home > OS >  Python: getting imported dates into a proper format to convert to datetime
Python: getting imported dates into a proper format to convert to datetime

Time:08-25

I have some sales data loaded into Python, unfortunately the date column is not in a format that I can convert into datetimes using pd.to_datetime. Note: The year 202 should be 2020

sales['Calendar Year/Month'].unique()
8.202 ,  9.202 , 10.202 , 11.202 , 12.202 ,  1.2021,
2.2021, 3.2021,  4.2021,  5.2021,  6.2021,  7.2021,
8.2021,  9.2021, 10.2021, 11.2021, 12.2021,  1.2022,
2.2022,  3.2022,  4.2022, 5.2022,  6.2022,  7.2022

Is there any way (maybe using regex) to convert the Calendar Year/Month column into a suitable format to use pd.to_datetime(sales['Calendar Year/Month'], format='%m.%Y')?

The rule I was thinking about was add a leading 0 if there is only one character before the ., and add a trailing 0 if there are only three characters after the ..

What is the best and most pythonic way to achieve this?

CodePudding user response:

I think you need a day field as well to have a datetime/date object. Using a dummy day of the first day of the month a possible solution might be (haven't checked edge cases etc.).

from math import floor
from datetime import date
values = [8.202 ,  9.202 , 10.202 , 11.202 , 12.202 ,  1.2021,
2.2021, 3.2021,  4.2021,  5.2021,  6.2021,  7.2021,
8.2021,  9.2021, 10.2021, 11.2021, 12.2021,  1.2022,
2.2022,  3.2022,  4.2022, 5.2022,  6.2022,  7.2022]
answer = [date(int(value%1*10000), floor(value), 1) for value in values]

CodePudding user response:

Maybe not the most pythonic, but this works if your dates are in string format.

def convert_date(date):
    month, year = date.split(".")
    if len(year) < 4:
        year = year   "0"
    return f"{month}.{year}"

# Set to str type if not already
df["Calendar Year/Month"] = df["Calendar Year/Month"].astype(str)

# Apply custom function to change the date format
df["Calendar Year/Month"] = df["Calendar Year/Month"].apply(convert_date)

# Convert to datetime
df["Calendar Year/Month"] = pd.to_datetime(df["Calendar Year/Month"], format='%m.%Y')
  • Related