Home > Software design >  Extract MMDDYYY date from dataframe rows
Extract MMDDYYY date from dataframe rows

Time:01-26

I have a dataframe where some rows of data contain a long string with a date in MMDDYYY format in the middle.

ID 
-
blah
unc.abc.155gdgeh0t4ngs8_XYZ_01252023_US_C_Home_en-us_RS_Nat'l-vs-UNC
blah
unc.abc.52gst4363463463_RST_01272023_US_C_Away_en-us_RS_Nat'l-vs-UNC
unc.abc.534gs23ujgf9d8f_UVX_02052023_US_C_Away_en-us_RS_TEST-vs-TEST
unc.abc.5830ugjshg5345s_AAA_11012023_CA_C_Home_en-us_RS_Reg-vs-HBS
unc.abc.fs44848fvs8gs82_MBB_12252023_US_C_Home_en-us_RS_Nat'l-vs-UNC
unc.abc.fe0wjv-578244fs_FFS_04222023_CA_C_Away_en-us_RS_Nat'l-vs-UNC 

I want to use the first date that appears in that column (01252023) as part of the filename, so how would I extract it and set it to a variable?

CodePudding user response:

IIUC, you can use pandas.Series.str.extract with pandas.Series.iat to get the first matched element/date.

filename = df["ID"].str.extract(".*_(\d{8})_.*").dropna().iat[0,0]

Output :

print(filename, type(filename))
#01252023 <class 'str'>

CodePudding user response:

Extract with str.extract then parse into datetime format.

df[['month', 'day', 'year']] = df['ID'].str.extract('_(?P<month>\d{2})(?P<day>\d{2})(?P<year>\d{4})_')
df['date'] = pd.to_datetime(df['day']   '-'   df['month']   '-'   df['year'], infer_datetime_format = True)
  • Related