I'm working on a panda dataframe where I find this Json in a column:
quote
{'BTC': {'price': 1, 'volume_24h': 1e-08, 'percent_change_1h': 0, 'percent_change_24h': 0, 'percent_change_7d': 0, 'market_cap': 11071985.881444559, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 134.210021972656, 'volume_24h': 0, 'percent_change_1h': 0.639231, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1488566971.9558687, 'last_updated': '2013-04-28T23:55:01.000Z'}}
{'BTC': {'price': 0.032343507812039, 'volume_24h': 1e-08, 'percent_change_1h': 0.799273, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 555151.4070926482, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 4.34840488433838, 'volume_24h': 0, 'percent_change_1h': 0.799273, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 74637021.56790735, 'last_updated': '2013-04-28T23:55:01.000Z'}}
{'BTC': {'price': 0.002874978304697, 'volume_24h': 1e-08, 'percent_change_1h': -0.934763, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 53927.00880335152, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.38652485609054604, 'volume_24h': 0, 'percent_change_1h': -0.934763, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 7250186.647688276, 'last_updated': '2013-04-28T23:55:03.000Z'}}
{'BTC': {'price': 0.008235615152382001, 'volume_24h': 1e-08, 'percent_change_1h': -0.0505028, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 44598.326734696995, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 1.10723268985748, 'volume_24h': 0, 'percent_change_1h': -0.0505028, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 5995997.185385211, 'last_updated': '2013-04-28T23:55:02.000Z'}}
{'BTC': {'price': 0.004811595748858, 'volume_24h': 1e-08, 'percent_change_1h': 0.609159, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 11180.078331276067, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.646892309188843, 'volume_24h': 0, 'percent_change_1h': 0.609159, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1503099.4011388426, 'last_updated': '2013-04-28T23:55:02.000Z'}}
{'BTC': {'price': 2.425762856e-06, 'volume_24h': 1e-08, 'percent_change_1h': 0.461694, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 10592.384991552675, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 0.00032613033545200005, 'volume_24h': 0, 'percent_change_1h': 0.461694, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1424087.2975724188, 'last_updated': '2013-04-28T23:55:14.000Z'}}
{'BTC': {'price': 0.03158483190407, 'volume_24h': 1e-08, 'percent_change_1h': 2.13819, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 8644.956027083475, 'fully_diluted_market_cap': None, 'last_updated': '2013-04-29T00:00:01.000Z'}, 'USD': {'price': 4.24640512466431, 'volume_24h': 0, 'percent_change_1h': 2.13819, 'percent_change_24h': None, 'percent_change_7d': None, 'market_cap': 1162266.2956510494, 'last_updated': '2013-04-28T23:55:03.000Z'}}
Name: quote, dtype: object
How can i do through pandas to transform this dataframe column containing Json into an ordered dataframe? I am interested in the part of the data in the USD section
I tried using "pd.json_normalize" but I can't get any changes.
CodePudding user response:
As your JSON is nested, you will get only 2 columns of the highest level sections BTC
and USD
even when we expand the JSON into dataframe. As such, we have to specifically get access to the part of the data you are interested first, i.e. the USD
section.
Optional Step: in case your column quote
is a string instead of real JSON, you can first convert the string to real JSON first, as below. Otherwise, just skip this step:
import ast
df['quote'] = df['quote'].map(ast.literal_eval)
After ensuring your column quote
is of real JSON, rather than string, to get access of the USD
section, we can use the string accessor str[]
, as follows:
df['quote'].str['USD']
(We can skip this step and run the last step. Here, just run to ensure you get the contents of the required section)
This will give:
0 {'price': 134.210021972656, 'volume_24h': 0, '...
1 {'price': 4.34840488433838, 'volume_24h': 0, '...
2 {'price': 0.38652485609054604, 'volume_24h': 0...
3 {'price': 1.10723268985748, 'volume_24h': 0, '...
4 {'price': 0.646892309188843, 'volume_24h': 0, ...
5 {'price': 0.00032613033545200005, 'volume_24h'...
6 {'price': 4.24640512466431, 'volume_24h': 0, '...
Name: quote, dtype: object
Then, to access its contents (i.e USD section) and expand into separate columns, we can use:
pd.DataFrame(df['quote'].str['USD'].tolist())
Result:
price volume_24h percent_change_1h percent_change_24h percent_change_7d market_cap last_updated
0 134.210022 0 0.639231 None None 1.488567e 09 2013-04-28T23:55:01.000Z
1 4.348405 0 0.799273 None None 7.463702e 07 2013-04-28T23:55:01.000Z
2 0.386525 0 -0.934763 None None 7.250187e 06 2013-04-28T23:55:03.000Z
3 1.107233 0 -0.050503 None None 5.995997e 06 2013-04-28T23:55:02.000Z
4 0.646892 0 0.609159 None None 1.503099e 06 2013-04-28T23:55:02.000Z
5 0.000326 0 0.461694 None None 1.424087e 06 2013-04-28T23:55:14.000Z
6 4.246405 0 2.138190 None None 1.162266e 06 2013-04-28T23:55:03.000Z