Home > database >  How can I remove the last 6 digits from all rows in a set column of integers?
How can I remove the last 6 digits from all rows in a set column of integers?

Time:09-21

I have a dataframe with two columns, one of which is dates formatted like 2021-05-01 and I would like to remove the day and month and only have the year. I tried:

df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y')

But apparently at least one of the rows has "00" for the month and/or day so this returned an error. I tried the solution here but it returned the following error:

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I'm very much a beginner and not sure what to do here. Thank you!

CodePudding user response:

If possible extract year by format YYYY use Series.str.extract:

df['year'] = df['date'].str.extract('(\d{4})', expand=False)

Or filter first 4 digits:

df['year'] = df['date'].str[:4]

Or remove last 6 digits:

df['year'] = df['date'].str[-6:]

CodePudding user response:

Try using dt.year

df.date = pd.to_datetime(df.date).dt.year
  • Related