I have data that I am grouping by and after that I want to filter it by current year.
df['PERIOD'] = pd.to_datetime(df['PERIOD'], format='%Y%m', errors='coerce').dt.strftime('%Y/%m')
groupedResult = df.groupby('PERIOD', dropna=True)['ACTUALS'].sum().reset_index()
I have tried to add this line
groupedResult = groupedResult[groupedResult['PERIOD'].dt.year == datetime.today().year]
but getting
AttributeError: Can only use .dt accessor with datetimelike values
What I am doing wrong?
Example dataset:
import pandas as pd
data = {'PERIOD':['202201','202201','202201','202201','202202','202202','202203'], 'ACTUALS':[10,20,30,40,50,60,70]}
df = pd.DataFrame(data)
print("BEFORE:")
CodePudding user response:
What I am doing wrong?
Here
df['PERIOD'] = pd.to_datetime(df['PERIOD'], format='%Y%m', errors='coerce').dt.strftime('%Y/%m')
you are converting PERIOD which is string to datetimelike, which is in turn converted (formatted) into string, thus you
AttributeError: Can only use .dt accessor with datetimelike values
as PERIOD hold string at that moment.
Consider converting PERIOD into datetime-like without further conversion into string. You might find pd.Grouper
useful for this task, consider following example which calculate sum of ACTUAL for each Month (freq='M'
)
import pandas as pd
data = {'PERIOD':['202201','202201','202201','202201','202202','202202','202203'], 'ACTUALS':[10,20,30,40,50,60,70]}
df = pd.DataFrame(data)
df['PERIOD'] = pd.to_datetime(df['PERIOD'],format='%Y%m')
df.set_index('PERIOD', inplace=True)
total = df.groupby(pd.Grouper(freq='M')).sum()
print(total)
output
ACTUALS
PERIOD
2022-01-31 100
2022-02-28 110
2022-03-31 70
CodePudding user response:
This is because you are comparing two different types of data. In your dataframe, the column 'PERIOD' is an object. But you are using a datetime to filter it. You need to convert your 'PERIOD' column to a datetime column after creating the dataframe, not before and don't turn it to a string after. This will work:
import pandas as pd
from datetime import datetime
data = {'PERIOD':['202201','202201','202201','202201','202202','202202','202203'], 'ACTUALS':[10,20,30,40,50,60,70]}
df = pd.DataFrame(data)
df['PERIOD'] = pd.to_datetime(df['PERIOD'], format='%Y%m', errors='coerce')
groupedResult = df.groupby('PERIOD', dropna=True)['ACTUALS'].sum().reset_index()