I have a datetime column that I want to 'stringify' it using strftime
, the problem is I want the months to be zero-based i.e. 0=January
, 11=December
.
What I've tried is after I 'stringified' the column and called str.replace
on it by passing a regex and a callable to convert the month to a number and subtract one from it and then convert it back to a string
Why do I want it to be zero-based? because this data going to be consumed by Google Charts and it requires date represented as string to be zero-based
here is the code, is there a better solution?
month_regex = r",(0[1-9]|1[0-2])"
# vvv -> month_regex
format = "Date(%Y,%m,%d,%H,%M,%S)"
print(df["start"].dtype) # float64 represents an epoch
# convert epoch to datetime and then to string with the given format
df["start"] = pd.to_datetime(df["start"]//1000, unit="s").dt.strftime(format)
print(df["start"]) # Date(2022,05,24,00,00,00)
df["start"] = df["start"].str.replace(
month_regex,
lambda match: "," str(int(match[0][1:]) - 1),
1, # first occurrence only
regex=True)
print(df["start"]) # Date(2022,4,24,00,00,00)
CodePudding user response:
Simply use string formatting to achieve the same result.
df = pd.to_datetime(pd.Series(["2022-01-01"]))
# We extract the month as integer and subtract one from it. Then do string formatting
df.apply(lambda x: x.strftime("Date(%Y,%i,%d,%H,%M,%S)") % (x.month-1))
CodePudding user response:
I would make your regex only capable of matching the month
part of the Date(...)
string by using a lookbehind for Date
followed by a (
, 4 digits and a comma:
(?<=Date\(\d{4},)\d\d
Then you need only worry about replacing the match:
df['start'].str.replace('(?<=Date\(\d{4},)\d\d', lambda m:f'{int(m[0])-1:02d}', regex=True)
Note I've used an f-string
to ensure the output month value has 2 digits (i.e. 04
instead of 4
), if that isn't necessary just use:
df['start'].str.replace('(?<=Date\(\d{4},)\d\d', lambda m:str(int(m[0])-1))