Home > Mobile >  How to extract year from a column with mixed formats
How to extract year from a column with mixed formats

Time:04-10

I have column in my dataset that looks like has the date in many different formats. Sometimes it only has the year and month, and other times only the year:

Date
1 January 1980
Oct-74
Oct-17
1980.0
-200
-50
8

I want to extract only the year from this column. For dates which are in the format "mmm-yy", I want to assume they are between 1921 and 2020. So my above column should look like this:

Year
1980
1974
2017
1980
-200
-50
8

How can I do this in Python? Any help would be appreciated.

#here is the code for the first dataframe
data = {'date': ['1 January 1980','Oct-74', 'Oct-17', '1980.0', '-200.0', '-50']}  
df= pd.DataFrame(data)
df

CodePudding user response:

Try this:

data = {'date': ['1 January 1980','Oct-74', 'Oct-17', '1980.0', '-200.0', '-50', '8']}  
df= pd.DataFrame(data)
temp = df['date'].str.replace('[a-zA-Z]{3}-', ' ').str.extract('([- \.\d]{1,}$)')
m1 = temp[0].str.contains('\ ')
temp[0] = temp[0].astype(float)
temp[0] = temp[0].where(~((m1)&(temp[0]>=21)), 1900 temp[0])
temp[0] = temp[0].where(~((m1)&(temp[0]<21)), 2000 temp[0])

Output:

enter image description here

  • Related