I want to compare the dates ranging from January 1st till the current day, from this year versus from last year.
example: get the rows with dates ranging from january 1st till november 29th 2020 get the rows with dates ranging from january 1st till november 29th 2021
here is what I have tryed doing, and the only way I picture it done:
#get year to date dates for last year
current_year = datetime.datetime.now().year
today = datetime.datetime.now().day
last_year = int(current_year) - 1
sales_data_df['Date'] = pd.to_datetime(sales_data_df['Date'])
sales_data_last_year = sales_data_df[sales_data_df['Date'].dt.year == last_year]
sales_data_last_ytd = sales_data_last_year[sales_data_last_year['Date'].dt.day < today]
sales_data_df2 = sales_data_last_ytd.groupby('Id').agg({'Quantity': sum, 'NetAmount': sum})
sales_data_df2['COGS_last_YTD'] = sales_data_df2['Quantity'] * sales_data_df2['NetAmount']
#get year to date dates for this year
sales_data_df['Date'] = pd.to_datetime(sales_data_df['Date'])
sales_data_this_year = sales_data_df[sales_data_df['Date'].dt.year == current_year]
sales_data_ytd = sales_data_last_year[sales_data_this_year['Date'].dt.day < today]
sales_data_df2 = sales_data_ytd.groupby('Id').agg({'Quantity': sum, 'NetAmount': sum})
sales_data_df2['COGS_YTD'] = sales_data_df2['Quantity'] * sales_data_df2['NetAmount']
and here is the output error:
TypeError: '<' not supported between instances of 'int' and 'datetime.datetime'
I am fairly new to pandas and I don't get why the error is about a date format mismatch since I made sure to use pd_todatetime, What is wrong with the processus I follow?
what would be alternative way to get that done without running in the same issue?
CodePudding user response:
It seems you can use directly the values for today
and current_year
as a pandas datetime objects instead of they numerical values
So, more specifically
current_year = datetime.datetime.now()
today = datetime.datetime.now()
It also seems you are comparing the date of today
with the same day in the year before, in that case, my best suggestion is to create a new variable
from dateutil.relativedelta import relativedelta
one_year_ago = datetime.datetime.now() - relativedelta(years=1)
So you could compare sales_data_df['Date']
with one_year_ago
CodePudding user response:
I think there's an easier way to do this, I also think you should check if there are any leap years if you plan on using the code with more data.
import pandas as pd
today = pd.to_datetime("now")
first_day_2021 = pd.to_datetime("01-01-2021")
today_last_year = today - pd.Timedelta(365, "day")
first_day_2020 = pd.to_datetime("01-01-2020")
Then you can easily select periods, given you have a datetime index.
df[first_day_2021: today]
df[first_day_2020: today_last_year]