Trying to export the data from the table on highshortinterest.com and I am unable to execute my for loop and print any of the results. Is there something wrong with my for loop? The website inspect page has the table, tbody, tr, and td in that order, so that was the order of my for loop. Where did I go wrong?
from bs4 import BeautifulSoup
import requests
urlSI = "https://highshortinterest.com/"
pageSI = requests.get(urlSI)
# print(pageSI.status_code) -- 200 = successful connection
soupSI = BeautifulSoup(pageSI.text, 'html.parser')
stocks = soupSI.find('table', class_='stocks')
# print(stocks) -- this is the last line in the code that actually prints data
for stock in stocks.find_all('tbody'):
rows = stock.find_all('tr')
# print(rows) -- nothing happens
for row in rows:
# print(row) -- nothing happens
info = row.find_all('td')
company = row.find_all('td')[0].text
exchange = row.find_all('td')[1].text
si = row.find_all('td')[2].text
s_float = row.find_all('td')[3].text
outstd = row.find_all('td')[4].text
industry = row.find_all('td')[5].text
# print(info, company) nothing happens
CodePudding user response:
You can pull the desired table data using pandas only.
import pandas as pd
df =pd.read_html('https://www.highshortinterest.com/')[2]
print(df)
Output:
Ticker ...
Industry
1 FUV ... Auto &
Truck Manufacturers
2 BYND ...
Food Processing
3 REV ...
Personal Products
4 BGFV ... Retailers - Miscellaneous Specialty
5 ICPT ... Biotechnology & Medical Research
6 NKLA ... Auto &
Truck Manufacturers
7 UPST ...
Consumer Lending
8 MSTR ... Software & Programming
9 BIG ... Retailers - Discount Stores
10 VTNR ... Oil & Gas - Refining and Marketing
11 BBBY ... Retail (Specialty Non-Apparel)
12 VUZI ... Electronic Equipment & Parts
13 SFT ... Retailers - Auto Vehicles, Parts & Service
14 EVGO ... Utilities - Electric
15 W ... Retailers
- Department Stores
16 CRTX ... Biotechnology & Medical Research
17 BLNK ... Utilities - Electric
18 CTRN ... Retailers - Apparel & Accessories
19 OCGN ... Biotechnology & Medical Research
20 RIDE ... Auto &
Truck Manufacturers
21 CVNA ... Retail (Specialty Non-Apparel)
22 EOSE ... Electrical Components & Equipment
23 CLVS ... Biotechnology & Drugs
24 APPH ...
Fishing & Farming
25 WKHS ... Auto &
Truck Manufacturers
26 <!-- google_ad_client = "pub-1641527371507802"... ... <!-- google_ad_client = "pub-1641527371507802"...
27 FUBO ...
Online Services
28 MVIS ... Electronic Equipment & Parts
29 OTRK ... Healthcare Facilities
30 RAD ...
Retailers - Drug
31 SPCE ...
Aerospace & Defense
32 SKLZ ... IT Services & Consulting
33 DNMR ... Chemicals - Commodity
34 PETS ...
Retail (Drugs)
35 SRG ... Real Estate Operations
36 SWTX ... Biotechnology & Medical Research
37 REI ... Oil & Gas - Exploration and Production
38 RMO ... Electrical Components & Equipment
39 ZYXI ... Advanced Medical Equipment & Technology
40 CVM ... Biotechnology & Medical Research
41 OMER ... Biotechnology & Drugs
42 GDRX ...
Online Services
43 SDC ... Medical Equipment, Supplies & Distribution
44 HYZN ... Heavy Machinery & Vehicles
45 SENS ... Medical Equipment & Supplies
46 KPTI ... Biotechnology & Medical Research
47 CNK ... Leisure & Recreation
48 IGMS ... Biotechnology & Medical Research
49 BKKT ... Fintech - Blockchain & Cryptocurrency
[50 rows x 7 columns]
Using bs4 with CSS selectors:
from bs4 import BeautifulSoup
import requests
url = "https://highshortinterest.com/"
pageSI = requests.get(url)
print(pageSI)
# print(pageSI.status_code) -- 200 = successful connection
soup = BeautifulSoup(pageSI.content, 'lxml')
for stock in soup.select('table.stocks:nth-of-type(2) tr')[1:]:
info = stock.select_one('td:nth-child(1)').get_text()
print(info)
company = stock.select_one('td:nth-child(2)').text
print(company)
exchange = stock.select_one('td:nth-child(3)').text
si = stock.select_one('td:nth-child(4)').text
s_float = stock.select_one('td:nth-child(5)').text
industry = stock.select_one('td:nth-child(7)').text
CodePudding user response:
Q : Why did your code not work?
A : You are not getting any data because there is no tbody tag in the html, hence every find_all results in an empty list.
The table neither has thead,tbody or tfoot. So you can directly start by parsing tr
.
CODE
This is how you can do the same
from bs4 import BeautifulSoup
import requests
urlSI = "https://highshortinterest.com/"
pageSI = requests.get(urlSI)
soupSI = BeautifulSoup(pageSI.text, 'html.parser')
stocks = soupSI.find('table', class_='stocks')
rows = stocks.find_all('tr')
for row in rows:
try:
info = row.find_all('td')
company = info[0].text
exchange = info[1].text
si = info[2].text
s_float = info[3].text
outstd = info[4].text
industry = info[5].text
print(company, exchange, si, s_float, outstd, industry, sep=', ')
except:
# in the html we have a styling row in between, to bypass that we have used try:except
pass
Result :
Ticker, Company, Exchange, ShortInt, Float, Outstd
FUV, Arcimoto Inc, Nasdaq, 42.04%, 29.84M, 38.78M
BYND, Beyond Meat Inc, Nasdaq, 40.18%, 56.81M, 63.54M
REV, Revlon Inc, NYSE, 38.48%, 7.57M, 54.54M
BGFV, Big 5 Sporting Goods Corp, Nasdaq, 37.74%, 20.88M, 22.33M
ICPT, Intercept Pharmaceuticals Inc, Nasdaq, 37.74%, 23.61M, 29.71M
NKLA, Nikola Corporation, Nasdaq, 36.97%, 247.02M, 421.14M
UPST, Upstart Holdings Inc, Nasdaq, 34.99%, 66.61M, 84.77M
MSTR, MicroStrategy Inc, Nasdaq, 34.67%, 9.32M, 9.33M
BIG, Big Lots, Inc., NYSE, 34.06%, 26.13M, 28.56M
VTNR, Vertex Energy Inc, Nasdaq, 33.67%, 44.84M, 64.58M
BBBY, Bed Bath & Beyond Inc., Nasdaq, 32.88%, 67.00M, 79.85M
VUZI, Vuzix Corp, Nasdaq, 31.02%, 59.24M, 63.67M
SFT, Shift Technologies Inc, Nasdaq, 30.79%, 53.06M, 82.80M
EVGO, Evgo Inc, Nasdaq, 30.10%, 67.76M, 69.00M
W, Wayfair Inc, NYSE, 30.05%, 72.80M, 79.56M
CRTX, Cortexyme Inc, Nasdaq, 30.00%, 15.80M, 30.15M
BLNK, Blink Charging Co, Nasdaq, 28.87%, 36.83M, 42.74M
CTRN, Citi Trends, Inc., Nasdaq, 28.30%, 8.35M, 8.67M
OCGN, Ocugen Inc, Nasdaq, 28.30%, 211.01M, 215.66M
RIDE, Lordstown Motors Corp, Nasdaq, 28.27%, 163.84M, 203.47M
CVNA, Carvana Co, NYSE, 28.11%, 99.35M, 105.74M
EOSE, Eos Energy Enterprises Inc, Nasdaq, 28.07%, 46.70M, 54.45M
CLVS, Clovis Oncology Inc, Nasdaq, 27.96%, 141.09M, 143.88M
APPH, Appharvest Inc, Nasdaq, 26.96%, 65.49M, 101.74M
WKHS, Workhorse Group Inc, Nasdaq, 26.37%, 156.45M, 163.51M
FUBO, Fubotv Inc, NYSE, 26.31%, 165.24M, 185.08M
MVIS, Microvision, Inc., Nasdaq, 26.14%, 163.70M, 165.21M
OTRK, Ontrak, Inc., Nasdaq, 25.78%, 11.65M, 20.86M
RAD, Rite Aid Corporation, NYSE, 25.69%, 53.95M, 55.65M
SPCE, Virgin Galactic Holdings Inc, NYSE, 25.57%, 207.56M, 258.59M
SKLZ, Skillz Inc, NYSE, 25.30%, 240.45M, 340.81M
DNMR, Danimer Scientific Inc, NYSE, 24.85%, 87.56M, 101.11M
PETS, Petmed Express Inc, Nasdaq, 24.37%, 19.70M, 20.98M
SRG, Seritage Growth Properties, NYSE, 24.03%, 34.67M, 43.68M
SWTX, SpringWorks Therapeutics Inc, Nasdaq, 23.50%, 31.66M, 49.41M
REI, Ring Energy Inc, AMEX, 22.77%, 73.81M, 106.70M
RMO, Romeo Power Inc, NYSE, 22.24%, 138.94M, 151.23M
ZYXI, Zynex Inc., Nasdaq, 21.90%, 22.55M, 39.05M
CVM, CEL-SCI Corporation, AMEX, 21.80%, 41.64M, 43.33M
OMER, Omeros Corporation, Nasdaq, 21.76%, 60.14M, 62.73M
GDRX, Goodrx Holdings Inc, Nasdaq, 21.71%, 65.61M, 82.73M
SDC, SmileDirectClub Inc, Nasdaq, 21.63%, 111.55M, 120.70M
HYZN, Hyzon Motors Inc, Nasdaq, 21.08%, 76.33M, 247.90M
SENS, Senseonics Holdings Inc, AMEX, 20.96%, 373.28M, 463.26M
KPTI, Karyopharm Therapeutics Inc, Nasdaq, 20.94%, 69.08M, 79.42M
CNK, Cinemark Holdings, Inc., NYSE, 20.94%, 107.11M, 120.45M
IGMS, IGM Biosciences Inc, Nasdaq, 20.90%, 15.73M, 26.08M
BKKT, Bakkt Holdings Inc, NYSE, 20.77%, 50.67M, 75.27M