Home > OS >  Json column in a dataframe
Json column in a dataframe

Time:10-29

I'm working on a panda dataframe where I find this Json in a column:

quote
{'BTC': {'price': 1, 'volume_24h': 1e-08, 'percent_change_1h': 0, 'percent_change_24h': 0, 'percent_change_7d': 0, 'market_cap': 11071985.881444559, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 134.210021972656, 'volume_24h': 0, 'percent_change_1h': 0.639231, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1488566971.9558687, 'last_updated': '2013-04-28T23:55:01.000Z'}}
{'BTC': {'price': 0.032343507812039, 'volume_24h': 1e-08, 'percent_change_1h': 0.799273, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 555151.4070926482, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 4.34840488433838, 'volume_24h': 0, 'percent_change_1h': 0.799273, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 74637021.56790735, 'last_updated': '2013-04-28T23:55:01.000Z'}}
{'BTC': {'price': 0.002874978304697, 'volume_24h': 1e-08, 'percent_change_1h': -0.934763, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 53927.00880335152, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.38652485609054604, 'volume_24h': 0, 'percent_change_1h': -0.934763, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 7250186.647688276, 'last_updated': '2013-04-28T23:55:03.000Z'}}
{'BTC': {'price': 0.008235615152382001, 'volume_24h': 1e-08, 'percent_change_1h': -0.0505028, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 44598.326734696995, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 1.10723268985748, 'volume_24h': 0, 'percent_change_1h': -0.0505028, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 5995997.185385211, 'last_updated': '2013-04-28T23:55:02.000Z'}}
{'BTC': {'price': 0.004811595748858, 'volume_24h': 1e-08, 'percent_change_1h': 0.609159, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 11180.078331276067, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.646892309188843, 'volume_24h': 0, 'percent_change_1h': 0.609159, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1503099.4011388426, 'last_updated': '2013-04-28T23:55:02.000Z'}}
{'BTC': {'price': 2.425762856e-06, 'volume_24h': 1e-08, 'percent_change_1h': 0.461694, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 10592.384991552675, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.00032613033545200005, 'volume_24h': 0, 'percent_change_1h': 0.461694, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1424087.2975724188, 'last_updated': '2013-04-28T23:55:14.000Z'}}
{'BTC': {'price': 0.03158483190407, 'volume_24h': 1e-08, 'percent_change_1h': 2.13819, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 8644.956027083475, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 4.24640512466431, 'volume_24h': 0, 'percent_change_1h': 2.13819, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1162266.2956510494, 'last_updated': '2013-04-28T23:55:03.000Z'}}
Name: quote, dtype: object

How can i do through pandas to transform this dataframe column containing Json into an ordered dataframe? I am interested in the part of the data in the USD section

I tried using "pd.json_normalize" but I can't get any changes.

CodePudding user response:

As your JSON is nested, you will get only 2 columns of the highest level sections BTC and USD even when we expand the JSON into dataframe. As such, we have to specifically get access to the part of the data you are interested first, i.e. the USD section.

Optional Step: in case your column quote is a string instead of real JSON, you can first convert the string to real JSON first, as below. Otherwise, just skip this step:

import ast
df['quote'] = df['quote'].map(ast.literal_eval)

After ensuring your column quote is of real JSON, rather than string, to get access of the USD section, we can use the string accessor str[], as follows:

df['quote'].str['USD']    

(We can skip this step and run the last step. Here, just run to ensure you get the contents of the required section)

This will give:

0    {'price': 134.210021972656, 'volume_24h': 0, '...
1    {'price': 4.34840488433838, 'volume_24h': 0, '...
2    {'price': 0.38652485609054604, 'volume_24h': 0...
3    {'price': 1.10723268985748, 'volume_24h': 0, '...
4    {'price': 0.646892309188843, 'volume_24h': 0, ...
5    {'price': 0.00032613033545200005, 'volume_24h'...
6    {'price': 4.24640512466431, 'volume_24h': 0, '...
Name: quote, dtype: object

Then, to access its contents (i.e USD section) and expand into separate columns, we can use:

pd.DataFrame(df['quote'].str['USD'].tolist())

Result:

        price  volume_24h  percent_change_1h percent_change_24h percent_change_7d    market_cap              last_updated
0  134.210022           0           0.639231               None              None  1.488567e 09  2013-04-28T23:55:01.000Z
1    4.348405           0           0.799273               None              None  7.463702e 07  2013-04-28T23:55:01.000Z
2    0.386525           0          -0.934763               None              None  7.250187e 06  2013-04-28T23:55:03.000Z
3    1.107233           0          -0.050503               None              None  5.995997e 06  2013-04-28T23:55:02.000Z
4    0.646892           0           0.609159               None              None  1.503099e 06  2013-04-28T23:55:02.000Z
5    0.000326           0           0.461694               None              None  1.424087e 06  2013-04-28T23:55:14.000Z
6    4.246405           0           2.138190               None              None  1.162266e 06  2013-04-28T23:55:03.000Z
  • Related