Home > Net >  Extract Date/time from string
Extract Date/time from string

Time:01-05

I have a dataframe that looks like this:

ID     RESULT
1      Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST 

I would like to extract both datetime variables so the new dataframe looks like this:

ID        END_TIME            START_TIME
1         12/30/2022 23:20    12/30/2022 23:16

I'm trying multiple methods but getting results where the 'END_TIME' and 'START_TIME' variables output is 'NA'.

TEST['END_TIME']=TEST['RESULT'].str.extract("Entered On:  (\d ) EST")
TEST['START_TIME']=TEST['RESULT'].str.extract("Performed On:  (\d ) EST")

CodePudding user response:

Assuming there always be 2 and only two timestamps, and using a more generic regex pattern, we can try:

test[["END_TIME", "START_TIME"]] = test["RESULT"].str.extract(r'Entered On:\s*(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2} [A-Z]{3})\s Performed On:\s*(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2} [A-Z]{3})')

Here is a regex demo showing that the regex pattern and capture groups are working correctly.

CodePudding user response:

Test dataframe to start with :

We build the following dataframe before applying regex function (I presume end date is always before start date) :

import pandas as pd
import re

### We build dataframe test first ###
s = "Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST"

df = pd.DataFrame([('1', s)], columns=['ID', 'RESULT'])

### ----------------------------- ###
  ID                                             RESULT
0  1  Pivot (Triage) Form Entered On:  12/30/2022 23...

You could use regex below in your code or use the following code (it's like best fits to you) regex = r'\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}'

With your script :

import pandas as pd
import re

### We build dataframe test first ###
s = "Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST"

df = pd.DataFrame([('1', s)], columns=['ID', 'RESULT'])

### ----------------------------- ###
# We define regex
regex = r'Form Entered On:  (\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{1,2}) EST'
df['END_TIME'] = df['RESULT'].str.extract(regex)
regex = r'Performed On:  (\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{1,2}) EST'
df['START_TIME'] = df['RESULT'].str.extract(regex)

Another way :

import pandas as pd
import re

### We build dataframe test first ###
s = "Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST"

df = pd.DataFrame([('1', s)], columns=['ID', 'RESULT'])

### ----------------------------- ###
# We define regex
regex = r'\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{1,2}'

df[['END_TIME', 'START_TIME']] = df.apply(lambda x: re.findall(regex, x.iloc[1]), axis=1).iloc[0]

df['END_TIME'] :

0    12/30/2022 23:20
Name: END_TIME, dtype: object

df['START_TIME'] :

0    12/30/2022 23:16
Name: START_TIME, dtype: object
  • Related