I have some date records which has some "Unknown" value for year, month, or day.
So it would be like, UKUK-UK-UK, or UKUK-01-UK, or 2022-UK-UK, etc. The rule I want is to make Unknown year to be 1900, month to be 12, and day to be the last day of the month.
I can deal with year and month. Since it's in dataframe:
import pandas as pd
df = pd.DataFrame({"a":["1991-UK-01","UKUK-01-01"],"b":["UKUK-UK-UK","1988-02-UK"],"c":["2001-03-UK","2003-08-UK"],"d":["1",""]})
df['a'] = df['a'].str.replace("UKUK","1900",regex=True)
df['a'] = df['a'].str.replace("-UK-","-12-",regex=True)
Might not be elegant way, but I did something like that.
But the problem is 'day' part.
Because if the day doesn't fit with the month(eg. 2021-09-31, 2021-02-31), it occurs an error while I apply pd.to_datetime
.
I was thinking if I have to make a list for rule like [31,28,31,30...31]
, but there is 29th of Februray which is actually a pain in my... hmm.
My Plan Z is convert them into first date of the next month, and then do -1 day
But... before I do that, I want to ask some advice if there is wiser way to do so.
Would there be a solution for this? Thanks as always.
CodePudding user response:
Just replace your last element with 01 then offset.
df['a'] = df['a'].str.replace('(?<=-)UK$', '01', regex=True)
df['date'] = pd.to_datetime(df['a']) pd.offsets.MonthEnd()