Home > database >  Issue normalising json and using Pandas
Issue normalising json and using Pandas

Time:03-09

I have an issue where I am trying to normalise my json response

Json example:

{
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  },
                  {...}
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  },
                  {...}
            ]
   }
}

My Code:

df_values = pd.json_normalize(content['data']['flavors']['aws'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
        for index, row in df_values.iterrows():
                df1 = df1.append(row)

Error:

TypeError: list indices must be integers or slices, not str

When using the following code it works but i don't get the data from the AWS array


    df_values = pd.json_normalize(content['data']['flavors'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
            for index, row in df_values.iterrows():
                    df1 = df1.append(row)

Output I want:

name cpu_number ram last_price currency
Basic_A4 8 14336 0.200 USD
example2 4 14336 0.100 USD
md5.xlarge 2 14336 0.100 USD
example4 2 7324 0.055 USD

update: I have now used the suggested method however i'm having issues with my new json


{
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  "provider": {
                    "name": "Azure"
                  }
                 }
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  "provider": {
                    "name": "AWS"
                  }
                 }
            ]
   }
}

The table now looks like this:

| name| cpu_number| ram| last_price| currency| provider|
|---- |------| -----|-----|-----|-----|
| Basic_A4| 8| 14336|0.200|USD| {'name': 'Azure'}|
| example2| 4| 14336|0.100|USD| {'name': 'Azure'}|
| md5.xlarge| 2| 14336|0.100|USD| {'name': 'AWS'}|
| example4| 2| 7324|0.055|USD| {'name': 'AWS'}|

I want it to look like this:

name cpu_number ram last_price currency provider
Basic_A4 8 14336 0.200 USD Azure

thanks

CodePudding user response:

Your json file, called data.json (In my example):

{
    "data":{
       "flavors":[
                  {
                   "name":"Basic_A4",
                   "cpu_number":8,
                   "ram":14336,
                   "last_price":"0.200",
                   "currency":"USD"
                   }
             ],
       "aws":[
                   {
                   "name":"md5.xlarge",
                   "cpu_number":2,
                   "ram":14336,
                   "last_price":"0.100",
                   "currency":"USD"
                   }
             ]
    }
 }

Read .json file and parse it to a dataframe with chain and itertools.

import json
from itertools import chain

# read your json file
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# parse it into a dataframe
df = pd.DataFrame(list(chain.from_iterable(dict_train["data"].values())))

df


         name  cpu_number    ram last_price currency
0    Basic_A4           8  14336      0.200      USD
1  md5.xlarge           2  14336      0.100      USD

CodePudding user response:

Addition to above answer, you can do it like this:

from pandas.io.json import json_normalize
data = {
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  },
                  {...}
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  },
                  {...}
            ]
   }
}
result = json_normalize(data['data']["flavors"])
print(result)
  • Related