I have the following "source" dataframe
Value
Date
2022-01-01 26
2022-01-02 87
2022-01-03 4
2022-01-04 65
2022-01-05 89
2022-01-06 88
2022-01-07 82
2022-01-08 60
2022-01-09 8
2022-01-10 22
In another "output" dataframe, I have 4 columns defining 2 ranges of dates. As you can see, range1 is always defined, whilst range2 is defined only in certain rows:
range1_lower range1_upper range2_lower range2_upper
0 2022-01-02 2022-01-03 2022-01-06 2022-01-07
1 2022-01-03 2022-01-06 NaN NaN
In the output df I need to add an "average" column where, for each row, I get the average of the "Value" column of the source df, if the Date falls in either of the two ranges. This is something similar to the AVERAGEIFS function in Excel.
So, for example, for row 0 I need to get the average of all Values falling within range1 (2022-01-02 to 2022-01-03) or within range2 (2022-01-06 to 2022-01-07).
For row 1, since range2 is not defined, I only need to get the average of all Values falling within range1 (2022-01-03 to 2022-01-06).
This is the expected output:
range1_lower range1_upper range2_lower range2_upper average
0 2022-01-02 2022-01-03 2022-01-06 2022-01-07 65.25
1 2022-01-03 2022-01-06 NaN NaN 61.50
Is there a way to do this? Thanks
CodePudding user response:
You can cross
merge
the two dataframes then filter out the dates that do not fall between the range limits then group by the range columns and aggregate Value
with mean
:
df = output.merge(source.reset_index(), how='cross')
df = df.query("range1_lower <= Date <= range1_upper or range2_lower <= Date <= range2_upper")
df = df.groupby([*output.columns], dropna=False)['Value'].mean().reset_index(name='avg')
print(df)
range1_lower range1_upper range2_lower range2_upper avg
0 2022-01-02 2022-01-03 2022-01-06 2022-01-07 65.25
1 2022-01-03 2022-01-06 NaT NaT 61.50