I have some sales data loaded into Python, unfortunately the date column is not in a format that I can convert into datetimes using pd.to_datetime
. Note: The year 202 should be 2020
sales['Calendar Year/Month'].unique()
8.202 , 9.202 , 10.202 , 11.202 , 12.202 , 1.2021,
2.2021, 3.2021, 4.2021, 5.2021, 6.2021, 7.2021,
8.2021, 9.2021, 10.2021, 11.2021, 12.2021, 1.2022,
2.2022, 3.2022, 4.2022, 5.2022, 6.2022, 7.2022
Is there any way (maybe using regex) to convert the Calendar Year/Month
column into a suitable format to use pd.to_datetime(sales['Calendar Year/Month'], format='%m.%Y')
?
The rule I was thinking about was add a leading 0 if there is only one character before the .
, and add a trailing 0 if there are only three characters after the .
.
What is the best and most pythonic way to achieve this?
CodePudding user response:
I think you need a day field as well to have a datetime/date object. Using a dummy day of the first day of the month a possible solution might be (haven't checked edge cases etc.).
from math import floor
from datetime import date
values = [8.202 , 9.202 , 10.202 , 11.202 , 12.202 , 1.2021,
2.2021, 3.2021, 4.2021, 5.2021, 6.2021, 7.2021,
8.2021, 9.2021, 10.2021, 11.2021, 12.2021, 1.2022,
2.2022, 3.2022, 4.2022, 5.2022, 6.2022, 7.2022]
answer = [date(int(value%1*10000), floor(value), 1) for value in values]
CodePudding user response:
Maybe not the most pythonic, but this works if your dates are in string format.
def convert_date(date):
month, year = date.split(".")
if len(year) < 4:
year = year "0"
return f"{month}.{year}"
# Set to str type if not already
df["Calendar Year/Month"] = df["Calendar Year/Month"].astype(str)
# Apply custom function to change the date format
df["Calendar Year/Month"] = df["Calendar Year/Month"].apply(convert_date)
# Convert to datetime
df["Calendar Year/Month"] = pd.to_datetime(df["Calendar Year/Month"], format='%m.%Y')