I have a REST API code as follows:
api_response = req.post('APIUrl', params=api_param,headers=api_header, json=api_body)
The json looks like
[
{
"productStatusMessage": "ACOPS ARE AVAILABLE FOR THIS CUSTOMER AND SKU",
"ingramPartNumber": "123512",
"vendorPartNumber": "LS1016A-CISCO",
"customerPartNumber": "A5-8963TEST",
"upc": "0718908728116",
"partNumberType": "T",
"vendorNumber": "1234",
"vendorName": "INTERNAL",
"description": "6FT PARALLEL PRINTER DB25M TO SVCS CENT36M PRO SERIES 28AWG ROHS",
"productClass": "P",
"uom": "EA",
"acceptBackOrder": true,
"productAuthorized": true,
"returnableProduct": true,
"endUserInfoRequired": true,
"availability": {
"available": true,
"totalAvailability": 240479,
"availabilityByWarehouse": [
{
"location": "Fort Worth, TX",
"warehouseId": "20",
"quantityAvailable": 105415
},
{
"location": "Carol Stream, IL",
"warehouseId": "40",
"quantityAvailable": 1049
}
],
"pricing": {
"currencyCode": "USD",
"retailPrice": 10,
"mapPrice": 540.25,
"customerPrice": 5.43
}
}
}
]
I import it into a dataframe and write to a text file using:
json_df = pd.read_json(api_response.text, orient='records')
with open (txt_file,'w') as me:
me.write(json_df.to_string(header=True, index = True))
The file looks like the following:
PartNumber vendorPartNumber upc partNumberType vendorNumber vendorName description productClass uom acceptBackOrder productAuthorized returnableProduct endUserInfoRequired availability pricing
0 123512 LS1016A-CISCO 0718908728116 T 1234 6FT PARALLEL PRINTER DB25M TO SVCS CENT36M PRO SERIES 28AWG ROHS P EA True True True False {'available': False, 'totalAvailability': 28374, 'availabilityByWarehouse': [{'location': 'Anywhere, IL', 'warehouseId': '15', 'quantityAvailable': 890, 'quantityBackordered': 54},{'location': 'BFE, TX', 'warehouseId': '67', 'quantityAvailable': 3456, 'quantityBackordered': 122}]} {'currencyCode': 'USD', 'retailPrice': 69.0, 'mapPrice': 69.0, 'customerPrice': 39.59}
This issue I'm having is getting the nested list 'availabilityByWarehouse' to be parsed out so that I have the following dataframe for importing into a SQL table.
PartNumber vendorPartNumber upc partNumberType vendorNumber vendorName description productClass uom acceptBackOrder productAuthorized returnableProduct endUserInfoRequired availabilityByWarehouse pricing
0 123512 LS1016A-CISCO 0718908728116 T 1234 6FT PARALLEL PRINTER DB25M TO SVCS CENT36M PRO SERIES 28AWG ROHS P EA True True True False {'location': 'Anywhere, IL', 'warehouseId': '15', 'quantityAvailable': 890, 'quantityBackordered': 54}} {'currencyCode': 'USD', 'retailPrice': 69.0, 'mapPrice': 69.0, 'customerPrice': 39.59} NaN NaN NaN
1 123512 LS1016A-CISCO 0718908728116 T 1234 6FT PARALLEL PRINTER DB25M TO SVCS CENT36M PRO SERIES 28AWG ROHS P EA True True True False {'location': 'BFE, TX', 'warehouseId': '67', 'quantityAvailable': 3456, 'quantityBackordered': 122} {'currencyCode': 'USD', 'retailPrice': 69.0, 'mapPrice': 69.0, 'customerPrice': 39.59}
Do I need to create another dataframe for the availability and then merge or is there a way to perform what I need inplace with the current dataframe? I'm fairly new with Python programming so if I'm missing something fundamental to Python, please call me out.
Edit: Added the Python code that creates the dataframe and the output of the dataframe. I had pipe-delimited my first post for readability.
CodePudding user response:
I deleted some elements for demonstration purpose, didn't change how data us structured. What you actually want is a flattened dataframe of a nested dictionary with 2 levels (if I understand correctly).
So here is the data I used:
data2 = """[
{
"PartNumber": "123512",
"vendorNumber": "1234",
"returnableProduct": true,
"availability": {
"available": true,
"totalAvailability": 240479,
"availabilityByWarehouse": [
{
"location": "Anywhere, IL",
"warehouseId": "15"
},
{
"location": "BFE, TX",
"warehouseId": "67"
}
],
"pricing": {
"currencyCode": "USD",
"retailPrice": 69
}
}
}
]
"""
data2 is a string in json format, that should be the equivalent to response.text
in your code. I commented it out like you would run the code.
# tmp = pd.json_normalize(json.loads(response.text))
tmp = pd.json_normalize(json.loads(data2))
out = (
pd.concat(
{
i: pd.DataFrame(x) for i, x in tmp.pop("availability.availabilityByWarehouse").items()
}
)
.reset_index(level=1, drop=True)
.join(tmp)
.reset_index(drop=True)
)
print(out)
Output:
location warehouseId PartNumber vendorNumber returnableProduct availability.available availability.totalAvailability availability.pricing.currencyCode availability.pricing.retailPrice
0 Anywhere, IL 15 123512 1234 True True 240479 USD 69
1 BFE, TX 67 123512 1234 True True 240479 USD 69