Home > OS >  Count number of characters between two specific characters in dataframe
Count number of characters between two specific characters in dataframe

Time:12-16

I'm trying to extract the date into a new column from a column containing a string in a dataframe using Python.

The field always starts with [SOME_TEXT]_DD/MM/YYYY

SOME_TEXT is of variable length

The characters [ and ] may or may not appear again later in the string

I am able to extract the date where [SOME_TEXT]_ is 18 characters long like this:-

df['date'] = MID(df['Column_with_date'],18,10)

So I figure if I know the length of [SOME_TEXT] I can adapt the above line.

How can I calculate the length of [SOME_TEXT]?

CodePudding user response:

We can use str.extract for a regex option:

df["date"] = df["Column_with_date"].str.extract(r'\[.*?\]_(\d{2}/\d{2}/\d{4})')

If additionally you want the date column to be a bona fide datetime column rather than text, then use pd.to_datetime():

df["date"] = pd.to_datetime(df["date"], format='%d/%m/%Y')
  • Related