Home > front end >  Python dict contain dataframes , how to explode them in to separate rows
Python dict contain dataframes , how to explode them in to separate rows

Time:11-19

RESULT_1 = {
    'market': 'Boston',
    'summary_yearly': pd.DataFrame({
        "year": [2022, 2023],
        "conversions": [58, 220],
        "weekly_active_customers": [400, 230],
        "box_count": [180, 1150]}),
    'model_params': {
        'rete_increase': 0.00,
        'order_increase': 0.00
    }
}

RESULT_2 = {
    'market': 'New York',
    'summary_yearly': pd.DataFrame({
        "year": [2022, 2023],
        "conversions": [58, 220],
        "weekly_active_customers": [410, 220],
        "box_count": np.array([180, 115]) * (1   0.02)}),
    'model_params': {
        'rete_increase': 0.00,
        'order_increase': 0.02
    }

then I used a function to append these results add_result is in sensetivity class

class Sensi:

    def __init__(
            self,
    ):

        self.results = []

    def add_result(
            self,
            result: dict) -> None:
        """store the previous results of model run in a list"""

        self.results.append(result)
        print(f"A new result for {result['market']} has been added.")



       def concat_summaries(
            self,
            market: str
    ) -> Optional[pd.DataFrame]:
        """concatenate the summary_yearly of every result for the market specified

        and include 2 additional columns:
         - 1 for rete_increase
         - 1 for order_increase

        and include 3 additional columns for:
        - conversions 2023 vs 2022 yoy growth
        - weekly_active_customers 2023 vs 2022 yoy growth
        - box_count 2023 vs 2022 yoy growth
         """

        results_market = self.get_all_results(market)

        """
    I need to help write this function(concat_summaries)
        """

then call below to append all, there are many result files

sensi = Sensi()
sensi.add_result(RESULT_1)
sensi.add_result(RESULT_2)

Once they all appended, result need to save in separate rows in the data frame,

I want something like this

   market   rate_increase  order_increase   year    conversions weekly_active_customers Box_count
0  New york 0.00           0.0          2022    58          400             180
1  New York 0.00           0.0          2023    220         230             1150
2  Boston   0.02           0.0          2022    180         410             180
3  Boston   0.02           0.0          2023    115         220             115

I think this sufficient details, please help me to write this function, i am new to handle this

CodePudding user response:

With the additional information: Try

results = [RESULT_1, RESULT_2]

df = pd.concat(
    (
        pd.DataFrame(
            {
                'market': result['market'],
                **result['model_params'],
                **result['summary_yearly'].to_dict(orient='list')
            }
        )
        for result in results
    ),
    ignore_index=True
)

or, probably better,

results = [RESULT_1, RESULT_2]

f = pd.concat(
    (
        result['summary_yearly'].assign(
            **{'market': result['market'], **result['model_params']}
        )
        for result in results
    ),
    ignore_index=True
)
df = df[df.columns.to_list()[-3:]   df.columns.to_list()[:-3]]

Results:

     market  rete_increase  ...  weekly_active_customers  box_count
0    Boston            0.0  ...                      400      180.0
1    Boston            0.0  ...                      230     1150.0
2  New York            0.0  ...                      410      183.6
3  New York            0.0  ...                      220      117.3

[4 rows x 7 columns]

I guess you'd replace results with sensi.results after adding the results to sensi.


Additional question from the comments:

cols = ["conversions", "weekly_active_customers", "box_count"]
df[[f"{c} 2023 vs 2022" for c in cols]] = df.groupby("market")[cols].pct_change()
  • Related