RESULT_1 = {
'market': 'Boston',
'summary_yearly': pd.DataFrame({
"year": [2022, 2023],
"conversions": [58, 220],
"weekly_active_customers": [400, 230],
"box_count": [180, 1150]}),
'model_params': {
'rete_increase': 0.00,
'order_increase': 0.00
}
}
RESULT_2 = {
'market': 'New York',
'summary_yearly': pd.DataFrame({
"year": [2022, 2023],
"conversions": [58, 220],
"weekly_active_customers": [410, 220],
"box_count": np.array([180, 115]) * (1 0.02)}),
'model_params': {
'rete_increase': 0.00,
'order_increase': 0.02
}
then I used a function to append these results add_result is in sensetivity class
class Sensi:
def __init__(
self,
):
self.results = []
def add_result(
self,
result: dict) -> None:
"""store the previous results of model run in a list"""
self.results.append(result)
print(f"A new result for {result['market']} has been added.")
def concat_summaries(
self,
market: str
) -> Optional[pd.DataFrame]:
"""concatenate the summary_yearly of every result for the market specified
and include 2 additional columns:
- 1 for rete_increase
- 1 for order_increase
and include 3 additional columns for:
- conversions 2023 vs 2022 yoy growth
- weekly_active_customers 2023 vs 2022 yoy growth
- box_count 2023 vs 2022 yoy growth
"""
results_market = self.get_all_results(market)
"""
I need to help write this function(concat_summaries)
"""
then call below to append all, there are many result files
sensi = Sensi()
sensi.add_result(RESULT_1)
sensi.add_result(RESULT_2)
Once they all appended, result need to save in separate rows in the data frame,
I want something like this
market rate_increase order_increase year conversions weekly_active_customers Box_count
0 New york 0.00 0.0 2022 58 400 180
1 New York 0.00 0.0 2023 220 230 1150
2 Boston 0.02 0.0 2022 180 410 180
3 Boston 0.02 0.0 2023 115 220 115
I think this sufficient details, please help me to write this function, i am new to handle this
CodePudding user response:
With the additional information: Try
results = [RESULT_1, RESULT_2]
df = pd.concat(
(
pd.DataFrame(
{
'market': result['market'],
**result['model_params'],
**result['summary_yearly'].to_dict(orient='list')
}
)
for result in results
),
ignore_index=True
)
or, probably better,
results = [RESULT_1, RESULT_2]
f = pd.concat(
(
result['summary_yearly'].assign(
**{'market': result['market'], **result['model_params']}
)
for result in results
),
ignore_index=True
)
df = df[df.columns.to_list()[-3:] df.columns.to_list()[:-3]]
Results:
market rete_increase ... weekly_active_customers box_count
0 Boston 0.0 ... 400 180.0
1 Boston 0.0 ... 230 1150.0
2 New York 0.0 ... 410 183.6
3 New York 0.0 ... 220 117.3
[4 rows x 7 columns]
I guess you'd replace results
with sensi.results
after adding the results to sensi
.
Additional question from the comments:
cols = ["conversions", "weekly_active_customers", "box_count"]
df[[f"{c} 2023 vs 2022" for c in cols]] = df.groupby("market")[cols].pct_change()