Home > Mobile >  Best way to convert UK(Unknown) string into date format? (eg. 2022-02-UK -> 2022-02-28)
Best way to convert UK(Unknown) string into date format? (eg. 2022-02-UK -> 2022-02-28)

Time:02-15

I have some date records which has some "Unknown" value for year, month, or day.

So it would be like, UKUK-UK-UK, or UKUK-01-UK, or 2022-UK-UK, etc. The rule I want is to make Unknown year to be 1900, month to be 12, and day to be the last day of the month.

I can deal with year and month. Since it's in dataframe:

import pandas as pd
df = pd.DataFrame({"a":["1991-UK-01","UKUK-01-01"],"b":["UKUK-UK-UK","1988-02-UK"],"c":["2001-03-UK","2003-08-UK"],"d":["1",""]})

df['a'] = df['a'].str.replace("UKUK","1900",regex=True)
df['a'] = df['a'].str.replace("-UK-","-12-",regex=True)

Might not be elegant way, but I did something like that. But the problem is 'day' part. Because if the day doesn't fit with the month(eg. 2021-09-31, 2021-02-31), it occurs an error while I apply pd.to_datetime.

I was thinking if I have to make a list for rule like [31,28,31,30...31], but there is 29th of Februray which is actually a pain in my... hmm.

My Plan Z is convert them into first date of the next month, and then do -1 day But... before I do that, I want to ask some advice if there is wiser way to do so.

Would there be a solution for this? Thanks as always.

CodePudding user response:

Just replace your last element with 01 then offset.

df['a'] = df['a'].str.replace('(?<=-)UK$', '01', regex=True)
df['date'] = pd.to_datetime(df['a'])   pd.offsets.MonthEnd()
  • Related