Home > Software design >  Unnest json dict to rows in pandas
Unnest json dict to rows in pandas

Time:05-15

I have the following dataset from a json file:

mydf = pd.DataFrame({
 'load': {
  0: {'id': '100','name': 'Joe'}, 1: {'id': '101','name': 'Ann'},
  2: {'id': '102','name': 'Sue'}, 3: {'id': '103','name': 'Leo'}},
 'order_date': {0: '2019-04-01', 1: '2019-04-01', 2: '2019-04-02', 3: '2019-04-03'},
 'detail': {
     0: [{'product_gross_total': 980,'unitary_gross_price': 490,
    'hierarchy_name': 'FOOD','payment': [{'amount': 980.0, 'id': 124}],
    'product_id': '230','product_name': 'APPLE','quantity': 2}],
     1: [{'product_gross_total': 1900,'unitary_gross_price': 1900,
    'hierarchy_name': 'MISC','payment': [{'amount': 1900.0, 'id': 125}],
    'product_id': '96','product_name': 'CIGAR','quantity': 1}],
    2: [{'product_gross_total': 600,'unitary_gross_price': 200,
    'hierarchy_name': 'FOOD','payment': [{'amount': 600.0, 'id': 126}],
    'product_id': '240','product_name': 'GRAPE','quantity': 3}],
    3: [{'product_gross_total': 1400,'unitary_gross_price': 700,
    'hierarchy_name': 'MISC','payment': [{'amount': 1400.0, 'id': 132}],
    'product_id': '78','product_name': 'QUMASK','quantity': 2},
       {'product_gross_total': 1800,'unitary_gross_price': 900,
    'hierarchy_name': 'MISC','payment': [{'amount': 1800.0, 'id': 132}],
    'product_id': '71','product_name': 'CANDLE','quantity': 2}]
  }})

And I want to convert the dictionaries into columns, but for each element in a list inside the column 'detail' I want to have a row per product. This is the expected result:

    order_date  id  name    product_gross_total unitary_gross_price hierarchy_name  product_id  product_name    quantity
0   2019-04-01  100 Joe     980                 490                 FOOD            230         APPLE           2
1   2019-04-01  101 Ann     1900                1900                MISC            96          CIGAR           1
2   2019-04-02  102 Sue     600                 200                 FOOD            240         GRAPE           3
3   2019-04-03  103 Leo     1400                700                 MISC            78          QUMASK          2
4   2019-04-03  103 Leo     1800                900                 MISC            71          CANDLE          2

This is what I have tried. Firstly I used 'json_normalize' for the 'load' column and it worked fine:

mydf = mydf.join(pd.json_normalize(mydf['load']))
mydf = mydf.drop(['load'], axis=1)
mydf

    order_date  detail                                              id  name
0   2019-04-01  [{'product_gross_total': 980, 'unitary_gross_p...   100 Joe
1   2019-04-01  [{'product_gross_total': 1900, 'unitary_gross_...   101 Ann
2   2019-04-02  [{'product_gross_total': 600, 'unitary_gross_p...   102 Sue
3   2019-04-03  [{'product_gross_total': 1400, 'unitary_gross_...   103 Leo

But when I tried to do the same with the detail column I get this

mydf = mydf.join(pd.json_normalize(mydf['detail']))
mydf = mydf.drop(['detail'], axis=1)
mydf

    order_date  id  name    0                                                   1
0   2019-04-01  100 Joe     {'product_gross_total': 980, 'unitary_gross_pr...   None
1   2019-04-01  101 Ann     {'product_gross_total': 1900, 'unitary_gross_p...   None
2   2019-04-02  102 Sue     {'product_gross_total': 600, 'unitary_gross_pr...   None
3   2019-04-03  103 Leo     {'product_gross_total': 1400, 'unitary_gross_p...   {'product_gross_total': 1800, 'unitary_gross_p...

I guess that per each element inside the column 'detail' it's adding a column... so if I have a transaction with 15 products, I'll have 15 columns. I'm stuck at converting them to rows. Any help or guidance will be greatly appreciated.

CodePudding user response:

Try:

mydf = mydf.explode("detail")

mydf = pd.concat(
    [
        mydf,
        mydf.pop("load").apply(pd.Series),
        mydf.pop("detail").apply(pd.Series),
    ],
    axis=1,
)
mydf = mydf.drop(columns="payment")

print(mydf.to_markdown())

Prints:

order_date id name product_gross_total unitary_gross_price hierarchy_name product_id product_name quantity
0 2019-04-01 100 Joe 980 490 FOOD 230 APPLE 2
1 2019-04-01 101 Ann 1900 1900 MISC 96 CIGAR 1
2 2019-04-02 102 Sue 600 200 FOOD 240 GRAPE 3
3 2019-04-03 103 Leo 1400 700 MISC 78 QUMASK 2
3 2019-04-03 103 Leo 1800 900 MISC 71 CANDLE 2
  • Related