I need two sets of data from this website:
https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings
Which include both the "Active Positions" and "New and Sold Out Positions" tables. The code i have can only provide one piece of data into a JSON:
import requests
import pandas as pd
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['newSoldOutPositions']['rows'])
df.to_json('AAPL_institutional_positions.json')
This will give the output of the following (JSON):
{
"positions":{
"0":"New Positions",
"1":"Sold Out Positions"
},
"holders":{
"0":"99",
"1":"90"
},
"shares":{
"0":"37,374,118",
"1":"4,637,465"
}
}
Whereas, for the other table I am scraping, I use this code (All's I have done is change "newSoldOutPositions" to "activePositions"):
import requests
import pandas as pd
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['activePositions']['rows'])
df.to_json('AAPL_institutional_positions.json')
Which gives this output (JSON):
{
"positions":{
"0":"Increased Positions",
"1":"Decreased Positions",
"2":"Held Positions",
"3":"Total Institutional Shares"
},
"holders":{
"0":"1,780",
"1":"2,339",
"2":"283",
"3":"4,402"
},
"shares":{
"0":"239,170,203",
"1":"209,017,331",
"2":"8,965,339,255",
"3":"9,413,526,789"
}
}
So my question being, is how can i combine the scraping to grab both sets of data and output them all in one JSON file?
Thanks
CodePudding user response:
If you only want json data, there is no need to use pandas:
import requests
nasdaq_dict = {}
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url, headers=headers)
nasdaq_dict['activePositions'] = r.json()['data']['activePositions']['rows']
nasdaq_dict['newSoldOutPositions'] = r.json()['data']['newSoldOutPositions']['rows']
print(nasdaq_dict)
Result in terminal:
{'activePositions': [{'positions': 'Increased Positions', 'holders': '1,795', 'shares': '200,069,709'}, {'positions': 'Decreased Positions', 'holders': '2,314', 'shares': '228,105,026'}, {'positions': 'Held Positions', 'holders': '308', 'shares': '8,976,744,094'}, {'positions': 'Total Institutional Shares', 'holders': '4,417', 'shares': '9,404,918,829'}], 'newSoldOutPositions': [{'positions': 'New Positions', 'holders': '121', 'shares': '55,857,143'}, {'positions': 'Sold Out Positions', 'holders': '73', 'shares': '8,851,038'}]}