Home > front end >  Combining multiple sets of data to one JSON file from api calls
Combining multiple sets of data to one JSON file from api calls

Time:11-16

I need two sets of data from this website:

https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings

Which include both the "Active Positions" and "New and Sold Out Positions" tables. The code i have can only provide one piece of data into a JSON:

import requests
import pandas as pd

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['newSoldOutPositions']['rows'])

df.to_json('AAPL_institutional_positions.json')

This will give the output of the following (JSON):

{
    "positions":{
        "0":"New Positions",
        "1":"Sold Out Positions"
    },
    "holders":{
        "0":"99",
        "1":"90"
    },
    "shares":{
        "0":"37,374,118",
        "1":"4,637,465"
    }
}

Whereas, for the other table I am scraping, I use this code (All's I have done is change "newSoldOutPositions" to "activePositions"):

import requests
import pandas as pd

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['activePositions']['rows'])

df.to_json('AAPL_institutional_positions.json')

Which gives this output (JSON):

{
    "positions":{
        "0":"Increased Positions",
        "1":"Decreased Positions",
        "2":"Held Positions",
        "3":"Total Institutional Shares"
    },
    "holders":{
        "0":"1,780",
        "1":"2,339",
        "2":"283",
        "3":"4,402"
    },
    "shares":{
        "0":"239,170,203",
        "1":"209,017,331",
        "2":"8,965,339,255",
        "3":"9,413,526,789"
    }
}

So my question being, is how can i combine the scraping to grab both sets of data and output them all in one JSON file?

Thanks

CodePudding user response:

If you only want json data, there is no need to use pandas:

import requests

nasdaq_dict = {}

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)

nasdaq_dict['activePositions'] = r.json()['data']['activePositions']['rows']
nasdaq_dict['newSoldOutPositions'] = r.json()['data']['newSoldOutPositions']['rows']
print(nasdaq_dict)

Result in terminal:

{'activePositions': [{'positions': 'Increased Positions', 'holders': '1,795', 'shares': '200,069,709'}, {'positions': 'Decreased Positions', 'holders': '2,314', 'shares': '228,105,026'}, {'positions': 'Held Positions', 'holders': '308', 'shares': '8,976,744,094'}, {'positions': 'Total Institutional Shares', 'holders': '4,417', 'shares': '9,404,918,829'}], 'newSoldOutPositions': [{'positions': 'New Positions', 'holders': '121', 'shares': '55,857,143'}, {'positions': 'Sold Out Positions', 'holders': '73', 'shares': '8,851,038'}]}
  • Related