I have a data frame that is formatted like this:
details | col_1 | col2 | col3 |
---|---|---|---|
ex1 2019 test | 1 | 1 | 1 |
ex1 2020 review | 2 | 2 | 2 |
example2 2021 survey | 3 | 3 | 3 |
row3 2019 data | 4 | 4 | 4 |
I want to create a new column called "Year" appended to the end of this data frame that takes the year value from the row name. I want it to look like this:
details | col_1 | col2 | col3 | Year |
---|---|---|---|---|
ex1 2019 test | 1 | 1 | 1 | 2019 |
ex1 2020 review | 2 | 2 | 2 | 2020 |
example2 2021 survey | 3 | 3 | 3 | 2021 |
row3 2019 data | 4 | 4 | 4 | 2019 |
The row names are unstandardized on purpose to reflect my actual data. Thanks in advance for the help!
CodePudding user response:
This will work:
df['Year'] = df.details.str.extract(r'\b(\d{4})\b').astype(int)
Output:
details col_1 col2 col3 Year
0 ex1 2019 test 1 1 1 2019
1 ex1 2020 review 2 2 2 2020
2 example2 2021 survey 3 3 3 2021
3 row3 2019 data 4 4 4 2019
CodePudding user response:
from dateutil.parser import parse
df['Year'] = df.apply(lambda row: parse(row.details, fuzzy=True).year, axis=1)