Combining multiple sets of data to one JSON file from api calls-CodePudding

I need two sets of data from this website:

https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings

Which include both the "Active Positions" and "New and Sold Out Positions" tables. The code i have can only provide one piece of data into a JSON:

import requests
import pandas as pd

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['newSoldOutPositions']['rows'])

df.to_json('AAPL_institutional_positions.json')

This will give the output of the following (JSON):

{
    "positions":{
        "0":"New Positions",
        "1":"Sold Out Positions"
    },
    "holders":{
        "0":"99",
        "1":"90"
    },
    "shares":{
        "0":"37,374,118",
        "1":"4,637,465"
    }
}

Whereas, for the other table I am scraping, I use this code (All's I have done is change "newSoldOutPositions" to "activePositions"):

import requests
import pandas as pd

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['activePositions']['rows'])

df.to_json('AAPL_institutional_positions.json')

Which gives this output (JSON):

{
    "positions":{
        "0":"Increased Positions",
        "1":"Decreased Positions",
        "2":"Held Positions",
        "3":"Total Institutional Shares"
    },
    "holders":{
        "0":"1,780",
        "1":"2,339",
        "2":"283",
        "3":"4,402"
    },
    "shares":{
        "0":"239,170,203",
        "1":"209,017,331",
        "2":"8,965,339,255",
        "3":"9,413,526,789"
    }
}

So my question being, is how can i combine the scraping to grab both sets of data and output them all in one JSON file?

Thanks

CodePudding user response：

If you only want json data, there is no need to use pandas:

import requests

nasdaq_dict = {}

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)

nasdaq_dict['activePositions'] = r.json()['data']['activePositions']['rows']
nasdaq_dict['newSoldOutPositions'] = r.json()['data']['newSoldOutPositions']['rows']
print(nasdaq_dict)

Result in terminal:

{'activePositions': [{'positions': 'Increased Positions', 'holders': '1,795', 'shares': '200,069,709'}, {'positions': 'Decreased Positions', 'holders': '2,314', 'shares': '228,105,026'}, {'positions': 'Held Positions', 'holders': '308', 'shares': '8,976,744,094'}, {'positions': 'Total Institutional Shares', 'holders': '4,417', 'shares': '9,404,918,829'}], 'newSoldOutPositions': [{'positions': 'New Positions', 'holders': '121', 'shares': '55,857,143'}, {'positions': 'Sold Out Positions', 'holders': '73', 'shares': '8,851,038'}]}