Home > front end >  Transforming a multi array json data into a flatterned dataframe with python and pandas
Transforming a multi array json data into a flatterned dataframe with python and pandas

Time:11-12

I get an arrayed multi level JSON response with one level which is a date and another one which duplicats the data in a lower lever array from an API which looks like this:

{
   "2021-11-04": {
      "40-41 (25-27)": {
         "sales": 26,
         "balance": 480,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 63,
         "balance": 817,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   },
   "2021-11-05": {
      "40-41 (25-27)": {
         "sales": 35,
         "balance": 445,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 95,
         "balance": 725,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   }
}

But what i need is to make it not an array but a flatten objects to easily form a dataframe with pandas. How it can be done?

The needed result:

{
  { 
    "day": "2021-11-04",
    "sales": 26,
    "balance": 480,
    "size_name": "40-41",
    "size_origin": "25-27"
  },
  {
    "day": "2021-11-04",
    "sales": 63,
    "balance": 817,
    "size_name": "42-43",
    "size_origin": "27-29"
   },
   { 
    "day": "2021-11-05",
    "sales": 35,
    "balance": 445,
    "size_name": "40-41",
    "size_origin": "25-27"
  },
  {
    "day": "2021-11-05",
    "sales": 95,
    "balance": 725,
    "size_name": "42-43",
    "size_origin": "27-29"
   }
}

I'm ok with transformation of it inside pandas and not while it's in JSON format but I still don't get it how can such structure can be transformed.

CodePudding user response:

You can do it with a simple double loop

data = {
   "2021-11-04": {
      "40-41 (25-27)": {
         "sales": 26,
         "balance": 480,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 63,
         "balance": 817,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   },
   "2021-11-05": {
      "40-41 (25-27)": {
         "sales": 35,
         "balance": 445,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 95,
         "balance": 725,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   }
}

records = []
for date, date_dict in data.items():
    for rec_id, rec in date_dict.items():
        rec['day'] = date 
        records.append(rec)

Output

>>> records

[{'sales': 26,
  'balance': 480,
  'size_name': '40-41',
  'size_origin': '25-27',
  'day': '2021-11-04'},
 {'sales': 63,
  'balance': 817,
  'size_name': '42-43',
  'size_origin': '27-29',
  'day': '2021-11-04'},
 {'sales': 35,
  'balance': 445,
  'size_name': '40-41',
  'size_origin': '25-27',
  'day': '2021-11-05'},
 {'sales': 95,
  'balance': 725,
  'size_name': '42-43',
  'size_origin': '27-29',
  'day': '2021-11-05'}]

>>> pd.DataFrame(records)

   sales  balance size_name size_origin         day
0     26      480     40-41       25-27  2021-11-04
1     63      817     42-43       27-29  2021-11-04
2     35      445     40-41       25-27  2021-11-05
3     95      725     42-43       27-29  2021-11-05
  • Related