I have a data frame which contains a column (SecurityPassDate). example data :
"applied on 11/10/2021. need to follow up on 01/12/2021"
"completed on 19/10/2021. need to renew on 01/12/2022"
"applied on 01/12/2021."
I want to create a new column(s) in the dataframe which extracts the dates from the free text.
I'm trying to use datefinder to do the work.
example code i am trying:
df.loc[:, 'SECURITY INTERVIEW DATE'] = df['SECURITY PASS DATE'].apply(lambda x: df.find_dates(x['SECURITY PASS DATE']))
the output i want to see is :
"applied on 11/10/2021. need to follow up on 01/12/2021", "11/10/2021", "01/12/2021"
"completed on 19/10/2021. need to renew on 01/12/2022", "19/10/2021", "01/12/2022"
"applied on 01/12/2021.", "01/12/2021"
How can I get this working? Getting the error:
String indices must be integers
CodePudding user response:
IIUC, you want:
df["SECURITY INTERVIEW DATE"] = df["SECURITY PASS DATE"].apply(lambda x: list(datefinder.find_dates(x)))
>>> df
SECURITY PASS DATE SECURITY INTERVIEW DATE
0 applied on 11/10/2021. need to follow up on 01... [2021-11-10 00:00:00, 2021-01-12 00:00:00]
1 completed on 19/10/2021. need to renew on 01/1... [2021-10-19 00:00:00, 2022-01-12 00:00:00]
2 applied on 01/12/2021. [2021-01-12 00:00:00]
Input df
df = pd.DataFrame({"SECURITY PASS DATE": ["applied on 11/10/2021. need to follow up on 01/12/2021",
"completed on 19/10/2021. need to renew on 01/12/2022",
"applied on 01/12/2021."]})
CodePudding user response:
I'd exctract first and convert then in your case.
df['SECURITY PASS DATE']= df['SECURITY PASS DATE'].str[-10:]
pd.to_datetime(df['SECURITY PASS DATE'])
CodePudding user response:
I would propose
# Extract dates as list
date_list = df['SECURITY INTERVIEW DATE'].apply(lambda x: list(datefinder.find_dates(x)))
# build a complementary dataframe where each column as a date
## build the dataframe, creating columns for the row lists items
df_dates = pd.DataFrame(date_list.tolist())
## rename the columns
df_dates = df_dates.set_axis(['date_' str(i 1) for i in df_dates.columns], axis=1)
## concat the original dataframe with the complementary one
df = pd.concat([df, df_dates], axis=1)
Output:
SECURITY INTERVIEW DATE date_1 date_2
0 applied on 11/10/2021. need to follow up on 01... 2021-11-10 2021-01-12
1 completed on 19/10/2021. need to renew on 01/1... 2021-10-19 2022-01-12
2 applied on 01/12/2021 2021-01-12 NaT