The code below calculates the mean','median','max','min'
of the vals
values that are in correlation to the dates month_changes
. The code calculates the mean','median','max','min'
by separating the intervals in years. I want to implement a bit to the code where it adds zeroes to the mean','median','max','min'
values for the years that start from the starting_year
variable, also to the years that are in between one another like 2020 in this example it will also add zeroes. How would I be able to do that?
import numpy as np
import pandas as pd
month_changes = np.array(["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00", "2019-04-01 00:00:00","2019-08-01 00:00:00", "2019-11-01 00:00:00", "2019-12-01 00:00:00","2021-01-01 00:00:00"])
vals = np.array([10, 23, 45, 4,5,12,4,-6])
starting_year = 2016
def YearlyIntervals(vals):
data = pd.DataFrame({"Date": month_changes, "Averages": vals})
data["Date"] = pd.to_datetime(data["Date"])
out=(data.groupby(data["Date"].dt.year)
.agg(['mean','median','max','min'])
.droplevel(0,1)
.rename(columns=lambda x:'Average' if x=='mean' else x.title())
)
return out
PnL_YearlyFilter= YearlyIntervals(vals)
Output
Average Median Max Min
Date
2018 16.5 16.5 23 10
2019 14.0 5.0 45 4
2021 -6.0 -6.0 -6 -6
Expected Output
Average Median Max Min
Date
2016 0 0 0 0
2017 0 0 0 0
2018 16.5 16.5 23 10
2019 14.0 5.0 45 4
2020 0 0 0 0
2021 -6.0 -6.0 -6 -6
CodePudding user response:
Use reindex
with a fill_value=0
.
One way it can be added to the function is like:
def yearly_intervals(mc, vs, start_year=None, end_year=None):
data = pd.DataFrame({
"Date": pd.to_datetime(mc), # Convert to_datetime immediately
"Averages": vs
})
out = (
data.groupby(data["Date"].dt.year)["Averages"] # Access Series
.agg(['mean', 'median', 'max', 'min'])
.rename(columns=lambda x: 'Average' if x == 'mean' else x.title())
)
# If start_year
if start_year is not None:
# Reindex to ensure index contains all years in range
out = out.reindex(range(
start_year,
# Use last year (maximum value) from index or user defined arg
(end_year if end_year is not None else out.index.max()) 1
), fill_value=0)
return out
Some changes made:
- PEP8 guidelines state that function names should "should be lowercase, with words separated by underscores as necessary". Function and Variable Names
- The numpy array can be converted directly
to_datetime
when building the DataFrame without needing to build the DataFrame then convert the Series afterwards. - Slight modifications to the groupby aggregation using
SeriesGroupBy.aggregate
instead of DataFrame groupby to ensure more consistent behaviour across versions. start_year
andend_year
kwargs with default parameters allow for more flexibility of the function so that any range of years can be used. (More argument processing could be added to ensure thatend_year
is always larger thanstart_year
)end_year
is optional. If noend_year
is provided it will use theIndex.max
(the largest value in the index)
Sample function call:
month_changes = np.array(
["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00",
"2019-04-01 00:00:00", "2019-08-01 00:00:00", "2019-11-01 00:00:00",
"2019-12-01 00:00:00", "2021-01-01 00:00:00"])
vals = np.array([10, 23, 45, 4, 5, 12, 4, -6])
starting_year = 2016
PnL_YearlyFilter = yearly_intervals(month_changes, vals, starting_year)
PnL_YearlyFilter
:
Average Median Max Min
Date
2016 0.0 0.0 0 0
2017 0.0 0.0 0 0
2018 16.5 16.5 23 10
2019 14.0 5.0 45 4
2020 0.0 0.0 0 0
2021 -6.0 -6.0 -6 -6