Home > Software design >  convert datetime column to a zero-based months. 0 is Jan and 11 is Dec
convert datetime column to a zero-based months. 0 is Jan and 11 is Dec

Time:07-07

I have a datetime column that I want to 'stringify' it using strftime, the problem is I want the months to be zero-based i.e. 0=January, 11=December.

What I've tried is after I 'stringified' the column and called str.replace on it by passing a regex and a callable to convert the month to a number and subtract one from it and then convert it back to a string

Why do I want it to be zero-based? because this data going to be consumed by Google Charts and it requires date represented as string to be zero-based

here is the code, is there a better solution?

month_regex = r",(0[1-9]|1[0-2])"
#                vvv  ->  month_regex
format = "Date(%Y,%m,%d,%H,%M,%S)"

print(df["start"].dtype)  # float64   represents an epoch

# convert epoch to datetime and then to string with the given format
df["start"] = pd.to_datetime(df["start"]//1000, unit="s").dt.strftime(format)

print(df["start"])  # Date(2022,05,24,00,00,00)

df["start"] = df["start"].str.replace(
    month_regex,
    lambda match: ","   str(int(match[0][1:]) - 1),
    1,  # first occurrence only
    regex=True)

print(df["start"])  # Date(2022,4,24,00,00,00)

CodePudding user response:

Simply use string formatting to achieve the same result.

df = pd.to_datetime(pd.Series(["2022-01-01"]))

# We extract the month as integer and subtract one from it. Then do string formatting
df.apply(lambda x: x.strftime("Date(%Y,%i,%d,%H,%M,%S)") % (x.month-1))

CodePudding user response:

I would make your regex only capable of matching the month part of the Date(...) string by using a lookbehind for Date followed by a (, 4 digits and a comma:

(?<=Date\(\d{4},)\d\d

Then you need only worry about replacing the match:

df['start'].str.replace('(?<=Date\(\d{4},)\d\d', lambda m:f'{int(m[0])-1:02d}', regex=True)

Note I've used an f-string to ensure the output month value has 2 digits (i.e. 04 instead of 4), if that isn't necessary just use:

df['start'].str.replace('(?<=Date\(\d{4},)\d\d', lambda m:str(int(m[0])-1))
  • Related