I'm trying to extract the date into a new column from a column containing a string in a dataframe using Python.
The field always starts with [SOME_TEXT]_DD/MM/YYYY
SOME_TEXT is of variable length
The characters [ and ] may or may not appear again later in the string
I am able to extract the date where [SOME_TEXT]_ is 18 characters long like this:-
df['date'] = MID(df['Column_with_date'],18,10)
So I figure if I know the length of [SOME_TEXT] I can adapt the above line.
How can I calculate the length of [SOME_TEXT]?
CodePudding user response:
We can use str.extract
for a regex option:
df["date"] = df["Column_with_date"].str.extract(r'\[.*?\]_(\d{2}/\d{2}/\d{4})')
If additionally you want the date
column to be a bona fide datetime column rather than text, then use pd.to_datetime()
:
df["date"] = pd.to_datetime(df["date"], format='%d/%m/%Y')