I have a bunch of bounding values in a dataframe,
forecast = pd.read_table(
StringIO(
"""
field1_lower field1_upper field2_lower field2_upper
"2022-09-25 10:00:00" 71.925000 75.275000 73.525000 75.275000
"2022-09-25 11:00:00" 71.625000 75.359000 73.225000 75.352100
"2022-09-25 12:00:00" 71.333333 75.339000 71.399333 75.339000
"2022-09-25 13:00:00" 64.571429 75.139000 90.511429 92.139000
"2022-09-25 14:00:00" 72.285714 75.339000 50.585714 75.339000
"""
),
delim_whitespace=True,
)
and a bunch of observed measurements,
actual = pd.read_table(
StringIO(
"""
field1 field2
"2022-09-25 10:00:00" 72.925000 74.275000
"2022-09-25 11:00:00" 73.525000 76.359000
"2022-09-25 12:00:00" 72.664333 72.339000
"2022-09-25 13:00:00" 70.231429 91.235000
"2022-09-25 14:00:00" 56.280000 66.239000
"""
),
delim_whitespace=True,
)
I'd like to check if the observed measurements are within the bounds as defined in the forecast
-dataframe.
I successfully converted the forecast dataframe into one with tuples,
def convert_column_pairs_to_tuples(df: pd.DataFrame) -> pd.DataFrame:
column_names = []
df_new = pd.DataFrame([], index=df.index)
for fieldname_lower, fieldname_upper in pairwise(list(df.columns)):
column_name = fieldname_lower.replace('_lower', '')
column_names.append(column_name)
assert column_name == fieldname_upper.replace('_upper', '')
df_new[column_name] = list(zip(df[fieldname_lower], df[fieldname_upper]))
return df_new
and I suppose I could use .apply
to solve the rest, but something doesn't smell right about having to have an intermediary DataFrame
with tuples.
Is there an alternative solution wherein I don't have to create the intermediary dataframe?
CodePudding user response:
You can compare columns directly
actual['field1_in_bound'] = (actual['field1']>forecast['field1_lower'])&(actual['field1']<forecast['field1_upper'])
2022-09-25 10:00:00 True
2022-09-25 11:00:00 True
2022-09-25 12:00:00 True
2022-09-25 13:00:00 True
2022-09-25 14:00:00 False
Name: field1_in_bound, dtype: bool
if you have many columns and you wish to do them all at once:
# I assume all your columns are sorted correctly
forecast_lower = forecast.iloc[:, ::2]
forecast_upper = forecast.iloc[:, 1::2]
actual[[c '_in_bound' for c in actual.columns]] = (actual>forecast_lower.values)&(actual<forecast_upper.values)
CodePudding user response:
Another possible solution:
df3 = actual.join(forecast)
fields = actual.columns
pd.concat(map(lambda x: ((df3[x].ge(df3[x '_lower']))
& (df3[x].le(df3[x '_upper']))).rename(x '_res'), fields), axis=1)
Output:
field1_res field2_res
2022-09-25 10:00:00 True True
2022-09-25 11:00:00 True False
2022-09-25 12:00:00 True True
2022-09-25 13:00:00 True True
2022-09-25 14:00:00 False True