Home > Back-end >  Extracting year from dataset in python - year appears a decimal point
Extracting year from dataset in python - year appears a decimal point

Time:04-01

Trying to extract year from dataset in python

df["YYYY"] = pd.DatetimeIndex(df["Date"]).year

year appears as decimal point in the new column.

YYYY
2001.0
2002.0
2015.0
2022.0

How to just have year appear with no decimal points?

CodePudding user response:

sample program for your problem

import pandas as pd
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000']})
df['date'] = pd.to_datetime(df['date'])
df['year'] = pd.DatetimeIndex(df['date']).year
print(df['year'])

pandas takes care of date by itself

if not we can directly specify as

df["date_feild"] = pd.to_datetime(df["date_feild"])

hope it will make things clear to you.

if not can you specify the df samples

CodePudding user response:

You likely have null values in you input resulting in NaNs and a float type for your column.

No missing values:

pd.DatetimeIndex(['2022-01-01']).year

Int64Index([2022], dtype='int64')

Missing values:

pd.DatetimeIndex(['2022-01-01', '']).year

Float64Index([2022.0, nan], dtype='float64')

I suggest to use pandas.to_datetime combined with convert_dtypes:

pd.to_datetime(pd.Series(['2022-01-01', ''])).dt.year.convert_dtypes()

0    2022
1    <NA>
dtype: Int64

Or to extract directly the year from the initial strings. But for that we would need a sample of the input.

  • Related