I am trying to apply a different approach but nothing is working as I can't slice the text as the month fields have variable length.
I tried slicing and extracting as well, but it makes a new dataframe and makes the code longer because then I have to split the column first, extract the year, and then concatenate the values back to the dataframe.
CodePudding user response:
Use str.split()
to turn it into a list. You can grab the year and convert it into an int from there.
df = pd.DataFrame({'date': ['October 1 2022 (United States)']})
df['year'] = int(df['date'].str.split()[0][2])
Output:
date year
October 1 2022 (United States) 2022
CodePudding user response:
You can also use regex and pd.Series.str.extract
:
df['year'] = df['date'].str.extract(r'(?P<Year>\d{4}(?=(?:\s \()))')
df
date year
0 October 1 2022 (United States) 2022
The regular expression I used matches values with similar pattern to your sample date. In case they differ in patterns we could use more flexible regex.