I am looking to web scrape a table consiting of 4000 rows from the following website:
https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings
Preferably I need someone to show how to use the Nasdaq api if possible. I believe the way I'd normally webscrape (using beautifulSoup) would be very inefficient for this task.
Thanks!
CodePudding user response:
The table is paginated, and every page is a new XHR call bringing 15 new records (offset by previous entries). Let's manipulate the url in our advantage - let's request, say, 7k records at once, with 0 offset (there are approx 4k entries total):
import requests
import pandas as pd
headers = {
'accept': 'application/json, text/plain, */*',
'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=7000&offset=0&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['holdingsTransactions']['table']['rows'])
print(df)
Result:
ownerName date sharesHeld sharesChange sharesChangePCT marketValue url
0 VANGUARD GROUP INC 06/30/2022 1,277,319,054 7,323,304 0.577% $191,214,662 /market-activity/institutional-portfolio/vanguard-group-inc-61322
1 BLACKROCK INC. 06/30/2022 1,028,688,317 1,055,430 0.103% $153,994,641 /market-activity/institutional-portfolio/blackrock-inc-711679
2 BERKSHIRE HATHAWAY INC 06/30/2022 894,802,319 3,878,909 0.435% $133,951,907 /market-activity/institutional-portfolio/berkshire-hathaway-inc-54239
3 STATE STREET CORP 06/30/2022 598,178,524 -15,673,750 -2.553% $89,547,325 /market-activity/institutional-portfolio/state-street-corp-6697
4 FMR LLC 09/30/2022 350,900,116 6,582,142 1.912% $52,529,747 /market-activity/institutional-portfolio/fmr-llc-12407
... ... ... ... ... ... ... ...
4397 VERSOR INVESTMENTS LP 09/30/2022 0 -5,171 Sold Out /market-activity/institutional-portfolio/versor-investments-lp-1015149
4398 WALLEYE CAPITAL LLC 06/30/2022 0 -44,561 Sold Out /market-activity/institutional-portfolio/walleye-capital-llc-1069483
4399 WALLEYE TRADING LLC 06/30/2022 0 -65,383 Sold Out /market-activity/institutional-portfolio/walleye-trading-llc-733607
4400 WARATAH CAPITAL ADVISORS LTD. 09/30/2022 0 -31,149 Sold Out /market-activity/institutional-portfolio/waratah-capital-advisors-ltd-901912
4401 WINSLOW CAPITAL MANAGEMENT, LLC 06/30/2022 0 -2,386 Sold Out /market-activity/institutional-portfolio/winslow-capital-management-llc-64122
4402 rows × 7 columns