I have a dataframe where some rows of data contain a long string with a date in MMDDYYY
format in the middle.
ID
-
blah
unc.abc.155gdgeh0t4ngs8_XYZ_01252023_US_C_Home_en-us_RS_Nat'l-vs-UNC
blah
unc.abc.52gst4363463463_RST_01272023_US_C_Away_en-us_RS_Nat'l-vs-UNC
unc.abc.534gs23ujgf9d8f_UVX_02052023_US_C_Away_en-us_RS_TEST-vs-TEST
unc.abc.5830ugjshg5345s_AAA_11012023_CA_C_Home_en-us_RS_Reg-vs-HBS
unc.abc.fs44848fvs8gs82_MBB_12252023_US_C_Home_en-us_RS_Nat'l-vs-UNC
unc.abc.fe0wjv-578244fs_FFS_04222023_CA_C_Away_en-us_RS_Nat'l-vs-UNC
I want to use the first date that appears in that column (01252023
) as part of the filename, so how would I extract it and set it to a variable?
CodePudding user response:
IIUC, you can use pandas.Series.str.extract
with pandas.Series.iat
to get the first matched element/date.
filename = df["ID"].str.extract(".*_(\d{8})_.*").dropna().iat[0,0]
Output :
print(filename, type(filename))
#01252023 <class 'str'>
CodePudding user response:
Extract with str.extract
then parse into datetime format.
df[['month', 'day', 'year']] = df['ID'].str.extract('_(?P<month>\d{2})(?P<day>\d{2})(?P<year>\d{4})_')
df['date'] = pd.to_datetime(df['day'] '-' df['month'] '-' df['year'], infer_datetime_format = True)