I have a dataframe with two columns, one of which is dates formatted like 2021-05-01 and I would like to remove the day and month and only have the year. I tried:
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y')
But apparently at least one of the rows has "00" for the month and/or day so this returned an error. I tried the solution here but it returned the following error:
TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I'm very much a beginner and not sure what to do here. Thank you!
CodePudding user response:
If possible extract year by format YYYY
use Series.str.extract
:
df['year'] = df['date'].str.extract('(\d{4})', expand=False)
Or filter first 4 digits:
df['year'] = df['date'].str[:4]
Or remove last 6 digits:
df['year'] = df['date'].str[-6:]
CodePudding user response:
Try using dt.year
df.date = pd.to_datetime(df.date).dt.year