Home > Enterprise >  how can I extract year from a column in python. the data is in this form: 'October 1, 2020 (Uni
how can I extract year from a column in python. the data is in this form: 'October 1, 2020 (Uni

Time:01-07

I am trying to apply a different approach but nothing is working as I can't slice the text as the month fields have variable length.

I tried slicing and extracting as well, but it makes a new dataframe and makes the code longer because then I have to split the column first, extract the year, and then concatenate the values back to the dataframe.

CodePudding user response:

Use str.split() to turn it into a list. You can grab the year and convert it into an int from there.

df = pd.DataFrame({'date': ['October 1 2022 (United States)']})

df['year'] = int(df['date'].str.split()[0][2])

Output:

                          date  year
October 1 2022 (United States)  2022

CodePudding user response:

You can also use regex and pd.Series.str.extract:

df['year'] = df['date'].str.extract(r'(?P<Year>\d{4}(?=(?:\s \()))')

df

                             date  year
0  October 1 2022 (United States)  2022

The regular expression I used matches values with similar pattern to your sample date. In case they differ in patterns we could use more flexible regex.

  • Related