I have column in my dataset that looks like has the date in many different formats. Sometimes it only has the year and month, and other times only the year:
Date |
---|
1 January 1980 |
Oct-74 |
Oct-17 |
1980.0 |
-200 |
-50 |
8 |
I want to extract only the year from this column. For dates which are in the format "mmm-yy", I want to assume they are between 1921 and 2020. So my above column should look like this:
Year |
---|
1980 |
1974 |
2017 |
1980 |
-200 |
-50 |
8 |
How can I do this in Python? Any help would be appreciated.
#here is the code for the first dataframe
data = {'date': ['1 January 1980','Oct-74', 'Oct-17', '1980.0', '-200.0', '-50']}
df= pd.DataFrame(data)
df
CodePudding user response:
Try this:
data = {'date': ['1 January 1980','Oct-74', 'Oct-17', '1980.0', '-200.0', '-50', '8']}
df= pd.DataFrame(data)
temp = df['date'].str.replace('[a-zA-Z]{3}-', ' ').str.extract('([- \.\d]{1,}$)')
m1 = temp[0].str.contains('\ ')
temp[0] = temp[0].astype(float)
temp[0] = temp[0].where(~((m1)&(temp[0]>=21)), 1900 temp[0])
temp[0] = temp[0].where(~((m1)&(temp[0]<21)), 2000 temp[0])
Output: