Home > Net >  Pandas , extract a date from a string value in a column
Pandas , extract a date from a string value in a column

Time:11-19

I have a data frame which contains a column (SecurityPassDate). example data :

"applied on 11/10/2021. need to follow up on 01/12/2021"
"completed on 19/10/2021. need to renew on 01/12/2022"
"applied on 01/12/2021."

I want to create a new column(s) in the dataframe which extracts the dates from the free text.

I'm trying to use datefinder to do the work.

example code i am trying:

df.loc[:, 'SECURITY INTERVIEW DATE'] = df['SECURITY PASS DATE'].apply(lambda x: df.find_dates(x['SECURITY PASS DATE']))

the output i want to see is :

"applied on 11/10/2021. need to follow up on 01/12/2021", "11/10/2021", "01/12/2021"
"completed on 19/10/2021. need to renew on 01/12/2022", "19/10/2021", "01/12/2022"
"applied on 01/12/2021.", "01/12/2021"

How can I get this working? Getting the error:

String indices must be integers

CodePudding user response:

IIUC, you want:

df["SECURITY INTERVIEW DATE"] = df["SECURITY PASS DATE"].apply(lambda x: list(datefinder.find_dates(x)))

>>> df
                                  SECURITY PASS DATE                     SECURITY INTERVIEW DATE
0  applied on 11/10/2021. need to follow up on 01...  [2021-11-10 00:00:00, 2021-01-12 00:00:00]
1  completed on 19/10/2021. need to renew on 01/1...  [2021-10-19 00:00:00, 2022-01-12 00:00:00]
2                             applied on 01/12/2021.                       [2021-01-12 00:00:00]
Input df
df = pd.DataFrame({"SECURITY PASS DATE": ["applied on 11/10/2021. need to follow up on 01/12/2021",
                                          "completed on 19/10/2021. need to renew on 01/12/2022",
                                          "applied on 01/12/2021."]})

CodePudding user response:

I'd exctract first and convert then in your case.

df['SECURITY PASS DATE']= df['SECURITY PASS DATE'].str[-10:]
pd.to_datetime(df['SECURITY PASS DATE'])

CodePudding user response:

I would propose

# Extract dates as list
date_list = df['SECURITY INTERVIEW DATE'].apply(lambda x: list(datefinder.find_dates(x)))

# build a complementary dataframe where each column as a date
## build the dataframe, creating columns for the row lists items
df_dates = pd.DataFrame(date_list.tolist())
## rename the columns
df_dates = df_dates.set_axis(['date_'  str(i 1) for i in df_dates.columns], axis=1)

## concat the original dataframe with the complementary one
df = pd.concat([df, df_dates], axis=1)

Output:


                            SECURITY INTERVIEW DATE      date_1      date_2
0 applied on 11/10/2021. need to follow up on 01...  2021-11-10  2021-01-12
1 completed on 19/10/2021. need to renew on 01/1...  2021-10-19  2022-01-12
2                             applied on 01/12/2021  2021-01-12         NaT
  • Related