Home > Blockchain >  Converting API Response into Dataframe for SQL Import
Converting API Response into Dataframe for SQL Import

Time:08-21

I have a REST API code as follows:

api_response = req.post('APIUrl', params=api_param,headers=api_header, json=api_body)

The json looks like

[
  {
    "productStatusMessage": "ACOPS ARE AVAILABLE FOR THIS CUSTOMER AND SKU",
    "ingramPartNumber": "123512",
    "vendorPartNumber": "LS1016A-CISCO",
    "customerPartNumber": "A5-8963TEST",
    "upc": "0718908728116",
    "partNumberType": "T",
    "vendorNumber": "1234",
    "vendorName": "INTERNAL",
    "description": "6FT PARALLEL PRINTER DB25M TO  SVCS CENT36M PRO SERIES 28AWG ROHS",
    "productClass": "P",
    "uom": "EA",
    "acceptBackOrder": true,
    "productAuthorized": true,
    "returnableProduct": true,
    "endUserInfoRequired": true,
    "availability": {
      "available": true,
      "totalAvailability": 240479,
      "availabilityByWarehouse": [
        {
          "location": "Fort Worth, TX",
          "warehouseId": "20",
          "quantityAvailable": 105415
        },
        {
          "location": "Carol Stream, IL",
          "warehouseId": "40",
          "quantityAvailable": 1049
        }
      ],
      "pricing": {
        "currencyCode": "USD",
        "retailPrice": 10,
        "mapPrice": 540.25,
        "customerPrice": 5.43
      }
    }
  }
]

I import it into a dataframe and write to a text file using:

json_df = pd.read_json(api_response.text, orient='records')
with open (txt_file,'w') as me:
        me.write(json_df.to_string(header=True, index = True))

The file looks like the following:

  PartNumber vendorPartNumber           upc partNumberType vendorNumber vendorName                                                description productClass uom  acceptBackOrder  productAuthorized  returnableProduct  endUserInfoRequired                                                                                                                                                                          availability                                                                                    pricing
0           123512          LS1016A-CISCO  0718908728116              T         1234     6FT PARALLEL PRINTER DB25M TO  SVCS CENT36M PRO SERIES 28AWG ROHS            P  EA             True               True               True                False   {'available': False, 'totalAvailability': 28374, 'availabilityByWarehouse': [{'location': 'Anywhere, IL', 'warehouseId': '15', 'quantityAvailable': 890, 'quantityBackordered': 54},{'location': 'BFE, TX', 'warehouseId': '67', 'quantityAvailable': 3456, 'quantityBackordered': 122}]}  {'currencyCode': 'USD', 'retailPrice': 69.0, 'mapPrice': 69.0, 'customerPrice': 39.59}

This issue I'm having is getting the nested list 'availabilityByWarehouse' to be parsed out so that I have the following dataframe for importing into a SQL table.

PartNumber vendorPartNumber           upc partNumberType vendorNumber vendorName                                                description productClass uom  acceptBackOrder  productAuthorized  returnableProduct  endUserInfoRequired                                                                                                                                                                          availabilityByWarehouse                                                                                    pricing
0           123512          LS1016A-CISCO  0718908728116              T         1234     6FT PARALLEL PRINTER DB25M TO  SVCS CENT36M PRO SERIES 28AWG ROHS            P  EA             True               True               True                False   {'location': 'Anywhere, IL', 'warehouseId': '15', 'quantityAvailable': 890, 'quantityBackordered': 54}}  {'currencyCode': 'USD', 'retailPrice': 69.0, 'mapPrice': 69.0, 'customerPrice': 39.59}               NaN                      NaN           NaN
1           123512          LS1016A-CISCO  0718908728116              T         1234     6FT PARALLEL PRINTER DB25M TO  SVCS CENT36M PRO SERIES 28AWG ROHS            P  EA             True               True               True                False   {'location': 'BFE, TX', 'warehouseId': '67', 'quantityAvailable': 3456, 'quantityBackordered': 122}  {'currencyCode': 'USD', 'retailPrice': 69.0, 'mapPrice': 69.0, 'customerPrice': 39.59}

Do I need to create another dataframe for the availability and then merge or is there a way to perform what I need inplace with the current dataframe? I'm fairly new with Python programming so if I'm missing something fundamental to Python, please call me out.

Edit: Added the Python code that creates the dataframe and the output of the dataframe. I had pipe-delimited my first post for readability.

CodePudding user response:

I deleted some elements for demonstration purpose, didn't change how data us structured. What you actually want is a flattened dataframe of a nested dictionary with 2 levels (if I understand correctly).

So here is the data I used:

data2 = """[
    {
        "PartNumber": "123512",
        "vendorNumber": "1234",
        "returnableProduct": true,
        "availability": {
            "available": true,
            "totalAvailability": 240479,
            "availabilityByWarehouse": [
                {
                    "location": "Anywhere, IL",
                    "warehouseId": "15"
                },
                {
                    "location": "BFE, TX",
                    "warehouseId": "67"
                }
            ],
            "pricing": {
            "currencyCode": "USD",
            "retailPrice": 69
            }
        }
    }
]
"""

data2 is a string in json format, that should be the equivalent to response.text in your code. I commented it out like you would run the code.

# tmp = pd.json_normalize(json.loads(response.text))
tmp = pd.json_normalize(json.loads(data2))

out = (
    pd.concat(
        {
            i: pd.DataFrame(x) for i, x in tmp.pop("availability.availabilityByWarehouse").items()
        }
    )
    .reset_index(level=1, drop=True)
    .join(tmp)
    .reset_index(drop=True)
)
print(out)

Output:

       location warehouseId PartNumber vendorNumber  returnableProduct  availability.available  availability.totalAvailability availability.pricing.currencyCode  availability.pricing.retailPrice
0  Anywhere, IL          15     123512         1234               True                    True                          240479                               USD                                69
1       BFE, TX          67     123512         1234               True                    True                          240479                               USD                                69
  • Related