I have the following dataset from a json file:
mydf = pd.DataFrame({
'load': {
0: {'id': '100','name': 'Joe'}, 1: {'id': '101','name': 'Ann'},
2: {'id': '102','name': 'Sue'}, 3: {'id': '103','name': 'Leo'}},
'order_date': {0: '2019-04-01', 1: '2019-04-01', 2: '2019-04-02', 3: '2019-04-03'},
'detail': {
0: [{'product_gross_total': 980,'unitary_gross_price': 490,
'hierarchy_name': 'FOOD','payment': [{'amount': 980.0, 'id': 124}],
'product_id': '230','product_name': 'APPLE','quantity': 2}],
1: [{'product_gross_total': 1900,'unitary_gross_price': 1900,
'hierarchy_name': 'MISC','payment': [{'amount': 1900.0, 'id': 125}],
'product_id': '96','product_name': 'CIGAR','quantity': 1}],
2: [{'product_gross_total': 600,'unitary_gross_price': 200,
'hierarchy_name': 'FOOD','payment': [{'amount': 600.0, 'id': 126}],
'product_id': '240','product_name': 'GRAPE','quantity': 3}],
3: [{'product_gross_total': 1400,'unitary_gross_price': 700,
'hierarchy_name': 'MISC','payment': [{'amount': 1400.0, 'id': 132}],
'product_id': '78','product_name': 'QUMASK','quantity': 2},
{'product_gross_total': 1800,'unitary_gross_price': 900,
'hierarchy_name': 'MISC','payment': [{'amount': 1800.0, 'id': 132}],
'product_id': '71','product_name': 'CANDLE','quantity': 2}]
}})
And I want to convert the dictionaries into columns, but for each element in a list inside the column 'detail' I want to have a row per product. This is the expected result:
order_date id name product_gross_total unitary_gross_price hierarchy_name product_id product_name quantity
0 2019-04-01 100 Joe 980 490 FOOD 230 APPLE 2
1 2019-04-01 101 Ann 1900 1900 MISC 96 CIGAR 1
2 2019-04-02 102 Sue 600 200 FOOD 240 GRAPE 3
3 2019-04-03 103 Leo 1400 700 MISC 78 QUMASK 2
4 2019-04-03 103 Leo 1800 900 MISC 71 CANDLE 2
This is what I have tried. Firstly I used 'json_normalize' for the 'load' column and it worked fine:
mydf = mydf.join(pd.json_normalize(mydf['load']))
mydf = mydf.drop(['load'], axis=1)
mydf
order_date detail id name
0 2019-04-01 [{'product_gross_total': 980, 'unitary_gross_p... 100 Joe
1 2019-04-01 [{'product_gross_total': 1900, 'unitary_gross_... 101 Ann
2 2019-04-02 [{'product_gross_total': 600, 'unitary_gross_p... 102 Sue
3 2019-04-03 [{'product_gross_total': 1400, 'unitary_gross_... 103 Leo
But when I tried to do the same with the detail column I get this
mydf = mydf.join(pd.json_normalize(mydf['detail']))
mydf = mydf.drop(['detail'], axis=1)
mydf
order_date id name 0 1
0 2019-04-01 100 Joe {'product_gross_total': 980, 'unitary_gross_pr... None
1 2019-04-01 101 Ann {'product_gross_total': 1900, 'unitary_gross_p... None
2 2019-04-02 102 Sue {'product_gross_total': 600, 'unitary_gross_pr... None
3 2019-04-03 103 Leo {'product_gross_total': 1400, 'unitary_gross_p... {'product_gross_total': 1800, 'unitary_gross_p...
I guess that per each element inside the column 'detail' it's adding a column... so if I have a transaction with 15 products, I'll have 15 columns. I'm stuck at converting them to rows. Any help or guidance will be greatly appreciated.
CodePudding user response:
Try:
mydf = mydf.explode("detail")
mydf = pd.concat(
[
mydf,
mydf.pop("load").apply(pd.Series),
mydf.pop("detail").apply(pd.Series),
],
axis=1,
)
mydf = mydf.drop(columns="payment")
print(mydf.to_markdown())
Prints:
order_date | id | name | product_gross_total | unitary_gross_price | hierarchy_name | product_id | product_name | quantity | |
---|---|---|---|---|---|---|---|---|---|
0 | 2019-04-01 | 100 | Joe | 980 | 490 | FOOD | 230 | APPLE | 2 |
1 | 2019-04-01 | 101 | Ann | 1900 | 1900 | MISC | 96 | CIGAR | 1 |
2 | 2019-04-02 | 102 | Sue | 600 | 200 | FOOD | 240 | GRAPE | 3 |
3 | 2019-04-03 | 103 | Leo | 1400 | 700 | MISC | 78 | QUMASK | 2 |
3 | 2019-04-03 | 103 | Leo | 1800 | 900 | MISC | 71 | CANDLE | 2 |