Home > Enterprise >  Pandas unpack the json column
Pandas unpack the json column

Time:04-06

I pulled a json file from web, a list of browser details. I am trying to create separate columns of version_list but for some reason it is not putting it correctly.

Currently this is how the dataframe looks like.

Browser Version_list
IE [{'version': '5.5', 'global_usage': 0.009298, 'release_date': 962323200, 'era': -6, 'prefix': ''}, {'version': '6', 'global_usage': 0.0131217, 'release_date': 998870400, 'era': -5, 'prefix': ''}, {'version': '7', 'global_usage': 0.00621152, 'release_date': 1161129600, 'era': -4, 'prefix': ''}, {'version': '8', 'global_usage': 0.0395479, 'release_date': 1237420800, 'era': -3, 'prefix': ''}, {'version': '9', 'global_usage': 0.0593219, 'release_date': 1300060800, 'era': -2, 'prefix': ''}, {'version': '10', 'global_usage': 0.0263653, 'release_date': 1346716800, 'era': -1, 'prefix': ''}, {'version': '11', 'global_usage': 0.619584, 'release_date': 1381968000, 'era': 0, 'prefix': ''}]
Edge [{'version': '12', 'global_usage': 0.008322, 'release_date': 1438128000, 'era': -26, 'prefix': 'ms'}, {'version': '13', 'global_usage': 0.004267, 'release_date': 1447286400, 'era': -25, 'prefix': 'ms'}, {'version': '14', 'global_usage': 0.004268, 'release_date': 1470096000, 'era': -24, 'prefix': 'ms'}, {'version': '15', 'global_usage': 0.012483, 'release_date': 1491868800, 'era': -23, 'prefix': 'ms'}, {'version': '16', 'global_usage': 0.004161, 'release_date': 1508198400, 'era': -22, 'prefix': 'ms'}, {'version': '17', 'global_usage': 0.012483, 'release_date': 1525046400, 'era': -21, 'prefix': 'ms'}, {'version': '18', 'global_usage': 0.074898, 'release_date': 1542067200, 'era': -20, 'prefix': 'ms'}, {'version': '79', 'global_usage': 0, 'release_date': 1579046400, 'era': -19, 'prefix': ''}, {'version': '80', 'global_usage': 0.004298, 'release_date': 1581033600, 'era': -18, 'prefix': ''}, {'version': '81', 'global_usage': 0.00944, 'release_date': 1586736000, 'era': -17, 'prefix': ''}, {'version': '83', 'global_usage': 0.004043, 'release_date': 1590019200, 'era': -16, 'prefix': ''}, {'version': '84', 'global_usage': 0.004161, 'release_date': 1594857600, 'era': -15, 'prefix': ''}, {'version': '85', 'global_usage': 0.008322, 'release_date': 1598486400, 'era': -14, 'prefix': ''}, {'version': '86', 'global_usage': 0.004161, 'release_date': 1602201600, 'era': -13, 'prefix': ''}, {'version': '87', 'global_usage': 0.008322, 'release_date': 1605830400, 'era': -12, 'prefix': ''}, {'version': '88', 'global_usage': 0.004318, 'release_date': 1611360000, 'era': -11, 'prefix': ''}, {'version': '89', 'global_usage': 0.008322, 'release_date': 1614816000, 'era': -10, 'prefix': ''}, {'version': '90', 'global_usage': 0.004161, 'release_date': 1618358400, 'era': -9, 'prefix': ''}, {'version': '91', 'global_usage': 0.004161, 'release_date': 1622073600, 'era': -8, 'prefix': ''}, {'version': '92', 'global_usage': 0.012483, 'release_date': 1626912000, 'era': -7, 'prefix': ''}, {'version': '93', 'global_usage': 0.004161, 'release_date': 1630627200, 'era': -6, 'prefix': ''}, {'version': '94', 'global_usage': 0.008322, 'release_date': 1632441600, 'era': -5, 'prefix': ''}, {'version': '95', 'global_usage': 0.029127, 'release_date': 1634774400, 'era': -4, 'prefix': ''}, {'version': '96', 'global_usage': 0.049932, 'release_date': 1637539200, 'era': -3, 'prefix': ''}, {'version': '97', 'global_usage': 0.853005, 'release_date': 1641427200, 'era': -2, 'prefix': ''}, {'version': '98', 'global_usage': 2.98344, 'release_date': 1643932800, 'era': -1, 'prefix': ''}, {'version': '99', 'global_usage': 0, 'release_date': 1646265600, 'era': 0, 'prefix': ''}]

The version_list is too nested, I have tried below code to separate it different column but it is not working

df = df["version_list"].apply(pd.Series)

CodePudding user response:

You can explode on Version_list column to convert list of dict to rows. Then apply pd.Series to turn dict row to columns. At last, concat them with original Browser column.

df = df.explode('Version_list')
df_ = pd.concat([df.drop(['Version_list'], axis=1), df['Version_list'].apply(pd.Series)], axis=1)
print(df_)

  Browser version  global_usage  release_date  era prefix
0      IE     5.5      0.009298     962323200   -6
0      IE       6      0.013122     998870400   -5
0      IE       7      0.006212    1161129600   -4
0      IE       8      0.039548    1237420800   -3
0      IE       9      0.059322    1300060800   -2
0      IE      10      0.026365    1346716800   -1
0      IE      11      0.619584    1381968000    0
1    Edge      12      0.008322    1438128000  -26     ms
1    Edge      13      0.004267    1447286400  -25     ms
1    Edge      14      0.004268    1470096000  -24     ms
1    Edge      15      0.012483    1491868800  -23     ms
1    Edge      16      0.004161    1508198400  -22     ms
1    Edge      17      0.012483    1525046400  -21     ms
1    Edge      18      0.074898    1542067200  -20     ms
1    Edge      79      0.000000    1579046400  -19
1    Edge      80      0.004298    1581033600  -18
1    Edge      81      0.009440    1586736000  -17
1    Edge      83      0.004043    1590019200  -16
1    Edge      84      0.004161    1594857600  -15
1    Edge      85      0.008322    1598486400  -14
1    Edge      86      0.004161    1602201600  -13
1    Edge      87      0.008322    1605830400  -12
1    Edge      88      0.004318    1611360000  -11
1    Edge      89      0.008322    1614816000  -10
1    Edge      90      0.004161    1618358400   -9
1    Edge      91      0.004161    1622073600   -8
1    Edge      92      0.012483    1626912000   -7
1    Edge      93      0.004161    1630627200   -6
1    Edge      94      0.008322    1632441600   -5
1    Edge      95      0.029127    1634774400   -4
1    Edge      96      0.049932    1637539200   -3
1    Edge      97      0.853005    1641427200   -2
1    Edge      98      2.983440    1643932800   -1
1    Edge      99      0.000000    1646265600    0
  • Related