Trying to extract year from dataset in python
df["YYYY"] = pd.DatetimeIndex(df["Date"]).year
year
appears as decimal point in the new column.
YYYY
2001.0
2002.0
2015.0
2022.0
How to just have year appear with no decimal points?
CodePudding user response:
sample program for your problem
import pandas as pd
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000']})
df['date'] = pd.to_datetime(df['date'])
df['year'] = pd.DatetimeIndex(df['date']).year
print(df['year'])
pandas takes care of date by itself
if not we can directly specify as
df["date_feild"] = pd.to_datetime(df["date_feild"])
hope it will make things clear to you.
if not can you specify the df samples
CodePudding user response:
You likely have null values in you input resulting in NaNs and a float type for your column.
No missing values:
pd.DatetimeIndex(['2022-01-01']).year
Int64Index([2022], dtype='int64')
Missing values:
pd.DatetimeIndex(['2022-01-01', '']).year
Float64Index([2022.0, nan], dtype='float64')
I suggest to use pandas.to_datetime
combined with convert_dtypes
:
pd.to_datetime(pd.Series(['2022-01-01', ''])).dt.year.convert_dtypes()
0 2022
1 <NA>
dtype: Int64
Or to extract directly the year from the initial strings. But for that we would need a sample of the input.