I have two data frames - one data frame consists of all metric values on a monthly basis, and another one consists of lower and upper bounds for a given period.
import pandas as pd
# sample data
data1 = {'Metric_Month': [pd.Timestamp('2022-05-01 00:00:00'), pd.Timestamp('2022-04-01 00:00:00'),
pd.Timestamp('2022-03-01 00:00:00'), pd.Timestamp('2022-02-01 00:00:00'),
pd.Timestamp('2022-01-01 00:00:00'), pd.Timestamp('2021-12-01 00:00:00'),
pd.Timestamp('2021-11-01 00:00:00'), pd.Timestamp('2021-10-01 00:00:00'),
pd.Timestamp('2021-09-01 00:00:00')],
'Metric_Value': [0.2876, 0.3002, 0.3169, 0.3137, 0.3022, 0.3071, 0.3122, 0.307, 0.3091]}
df1 = pd.DataFrame(data1)
data2 = {'start_date': [pd.Timestamp('2021-11-01 00:00:00'), pd.Timestamp('2021-10-01 00:00:00')],
'end_date': [pd.Timestamp('2022-05-01 00:00:00'), pd.Timestamp('2022-04-01 00:00:00')],
'lower_bound': [0.2954, 0.2991], 'upper_bound': [0.3221, 0.3206]}
df2 = pd.DataFrame(data2)
Metric data frame:
Metric Month Metric Value
2022-05-01 0.2876
2022-04-01 0.3002
2022-03-01 0.3169
2022-02-01 0.3137
2022-01-01 0.3022
2021-12-01 0.3071
2021-11-01 0.3122
2021-10-01 0.3070
2021-09-01 0.3091
Bounds data frame:
start_date end_date lower_bound upper_bound
2021-11-01 2022-05-01 0.2954 0.3221
2021-10-01 2022-04-01 0.2991 0.3206
My goal is to plot these (different but overlapping) lower and upper bounds on the same scatter plot of metric values.
My idea is to expand the data frame so that new dates are added in rows between the start and the end (included) dates as I want to do fill_between
plots on this data set.
In our example a new data frame would look like this (end_date
column can be removed):
start_date end_date lower_bound upper_bound
2021-11-01 2022-05-01 0.2954 0.3221
2021-12-01 2022-05-01 0.2954 0.3221
2022-01-01 2022-05-01 0.2954 0.3221
...
2022-05-01 2022-05-01 0.2954 0.3221
2021-10-01 2022-04-01 0.2991 0.3206
2021-11-01 2022-04-01 0.2991 0.3206
2021-12-01 2022-04-01 0.2991 0.3206
...
2022-04-01 2022-04-01 0.2991 0.3206
The final result should be a plot like this:
How to do this, and do I need to extend the bounds df in such a way?
CodePudding user response: