Home > Net >  Unable to parse table data using bs4
Unable to parse table data using bs4

Time:06-21

Trying to export the data from the table on highshortinterest.com and I am unable to execute my for loop and print any of the results. Is there something wrong with my for loop? The website inspect page has the table, tbody, tr, and td in that order, so that was the order of my for loop. Where did I go wrong?

from bs4 import BeautifulSoup
import requests

urlSI = "https://highshortinterest.com/"
pageSI = requests.get(urlSI)
# print(pageSI.status_code)  -- 200 = successful connection
soupSI = BeautifulSoup(pageSI.text, 'html.parser')

stocks = soupSI.find('table', class_='stocks')
# print(stocks) -- this is the last line in the code that actually prints data
for stock in stocks.find_all('tbody'):
    rows = stock.find_all('tr')
    # print(rows) -- nothing happens
    for row in rows:
        # print(row) -- nothing happens
        info = row.find_all('td')
        company = row.find_all('td')[0].text
        exchange = row.find_all('td')[1].text
        si = row.find_all('td')[2].text
        s_float = row.find_all('td')[3].text
        outstd = row.find_all('td')[4].text
        industry = row.find_all('td')[5].text
        # print(info, company) nothing happens

CodePudding user response:

You can pull the desired table data using pandas only.

import pandas as pd
df =pd.read_html('https://www.highshortinterest.com/')[2]
print(df)

Output:

                                Ticker  ...
           Industry
1                                                 FUV  ...                         Auto & 
Truck Manufacturers
2                                                BYND  ...
    Food Processing
3                                                 REV  ...
  Personal Products
4                                                BGFV  ...                Retailers - Miscellaneous Specialty
5                                                ICPT  ...                   Biotechnology & Medical Research
6                                                NKLA  ...                         Auto & 
Truck Manufacturers
7                                                UPST  ...
   Consumer Lending
8                                                MSTR  ...                             Software & Programming
9                                                 BIG  ...                        Retailers - Discount Stores
10                                               VTNR  ...                 Oil & Gas - Refining and Marketing
11                                               BBBY  ...                     Retail (Specialty Non-Apparel)
12                                               VUZI  ...                       Electronic Equipment & Parts
13                                                SFT  ...         Retailers - Auto Vehicles, Parts & Service
14                                               EVGO  ...                               Utilities - Electric
15                                                  W  ...                      Retailers 
- Department Stores
16                                               CRTX  ...                   Biotechnology & Medical Research
17                                               BLNK  ...                               Utilities - Electric
18                                               CTRN  ...                  Retailers - Apparel & Accessories
19                                               OCGN  ...                   Biotechnology & Medical Research
20                                               RIDE  ...                         Auto & 
Truck Manufacturers
21                                               CVNA  ...                     Retail (Specialty Non-Apparel)
22                                               EOSE  ...                  Electrical Components & Equipment
23                                               CLVS  ...                              Biotechnology & Drugs
24                                               APPH  ...
  Fishing & Farming
25                                               WKHS  ...                         Auto & 
Truck Manufacturers
26  <!-- google_ad_client = "pub-1641527371507802"...  ...  <!-- google_ad_client = "pub-1641527371507802"...
27                                               FUBO  ...
    Online Services
28                                               MVIS  ...                       Electronic Equipment & Parts
29                                               OTRK  ...                              Healthcare Facilities
30                                                RAD  ...
   Retailers - Drug
31                                               SPCE  ...
Aerospace & Defense
32                                               SKLZ  ...                           IT Services & Consulting
33                                               DNMR  ...                              Chemicals - Commodity
34                                               PETS  ...
     Retail (Drugs)
35                                                SRG  ...                             Real Estate Operations
36                                               SWTX  ...                   Biotechnology & Medical Research
37                                                REI  ...             Oil & Gas - Exploration and Production
38                                                RMO  ...                  Electrical Components & Equipment
39                                               ZYXI  ...            Advanced Medical Equipment & Technology
40                                                CVM  ...                   Biotechnology & Medical Research
41                                               OMER  ...                              Biotechnology & Drugs
42                                               GDRX  ...
    Online Services
43                                                SDC  ...         Medical Equipment, Supplies & Distribution
44                                               HYZN  ...                         Heavy Machinery & Vehicles
45                                               SENS  ...                       Medical Equipment & Supplies
46                                               KPTI  ...                   Biotechnology & Medical Research
47                                                CNK  ...                               Leisure & Recreation
48                                               IGMS  ...                   Biotechnology & Medical Research
49                                               BKKT  ...              Fintech - Blockchain & Cryptocurrency

[50 rows x 7 columns]

Using bs4 with CSS selectors:

from bs4 import BeautifulSoup
import requests

url = "https://highshortinterest.com/"
pageSI = requests.get(url)
print(pageSI)
# print(pageSI.status_code)  -- 200 = successful connection
soup = BeautifulSoup(pageSI.content, 'lxml')

for stock in soup.select('table.stocks:nth-of-type(2) tr')[1:]:
    info = stock.select_one('td:nth-child(1)').get_text()
    print(info) 
    company = stock.select_one('td:nth-child(2)').text
    print(company)
    exchange = stock.select_one('td:nth-child(3)').text
    si = stock.select_one('td:nth-child(4)').text
    s_float = stock.select_one('td:nth-child(5)').text
    industry = stock.select_one('td:nth-child(7)').text
  

CodePudding user response:

Q : Why did your code not work?

A : You are not getting any data because there is no tbody tag in the html, hence every find_all results in an empty list.

The table neither has thead,tbody or tfoot. So you can directly start by parsing tr.

CODE

This is how you can do the same

from bs4 import BeautifulSoup
import requests

urlSI = "https://highshortinterest.com/"
pageSI = requests.get(urlSI)
soupSI = BeautifulSoup(pageSI.text, 'html.parser')

stocks = soupSI.find('table', class_='stocks')
rows = stocks.find_all('tr')
for row in rows:
    try:
            info = row.find_all('td')
            company = info[0].text
            exchange = info[1].text
            si = info[2].text
            s_float = info[3].text
            outstd = info[4].text
            industry = info[5].text
            print(company, exchange, si, s_float, outstd, industry, sep=', ')

    except:
            # in the html we have a styling row in between, to bypass that we have used try:except
            pass

Result :

Ticker, Company, Exchange, ShortInt, Float, Outstd
FUV, Arcimoto Inc, Nasdaq, 42.04%, 29.84M, 38.78M
BYND, Beyond Meat Inc, Nasdaq, 40.18%, 56.81M, 63.54M
REV, Revlon Inc, NYSE, 38.48%, 7.57M, 54.54M
BGFV, Big 5 Sporting Goods Corp, Nasdaq, 37.74%, 20.88M, 22.33M
ICPT, Intercept Pharmaceuticals Inc, Nasdaq, 37.74%, 23.61M, 29.71M
NKLA, Nikola Corporation, Nasdaq, 36.97%, 247.02M, 421.14M
UPST, Upstart Holdings Inc, Nasdaq, 34.99%, 66.61M, 84.77M
MSTR, MicroStrategy Inc, Nasdaq, 34.67%, 9.32M, 9.33M
BIG, Big Lots, Inc., NYSE, 34.06%, 26.13M, 28.56M
VTNR, Vertex Energy Inc, Nasdaq, 33.67%, 44.84M, 64.58M
BBBY, Bed Bath & Beyond Inc., Nasdaq, 32.88%, 67.00M, 79.85M
VUZI, Vuzix Corp, Nasdaq, 31.02%, 59.24M, 63.67M
SFT, Shift Technologies Inc, Nasdaq, 30.79%, 53.06M, 82.80M
EVGO, Evgo Inc, Nasdaq, 30.10%, 67.76M, 69.00M
W, Wayfair Inc, NYSE, 30.05%, 72.80M, 79.56M
CRTX, Cortexyme Inc, Nasdaq, 30.00%, 15.80M, 30.15M
BLNK, Blink Charging Co, Nasdaq, 28.87%, 36.83M, 42.74M
CTRN, Citi Trends, Inc., Nasdaq, 28.30%, 8.35M, 8.67M
OCGN, Ocugen Inc, Nasdaq, 28.30%, 211.01M, 215.66M
RIDE, Lordstown Motors Corp, Nasdaq, 28.27%, 163.84M, 203.47M
CVNA, Carvana Co, NYSE, 28.11%, 99.35M, 105.74M
EOSE, Eos Energy Enterprises Inc, Nasdaq, 28.07%, 46.70M, 54.45M
CLVS, Clovis Oncology Inc, Nasdaq, 27.96%, 141.09M, 143.88M
APPH, Appharvest Inc, Nasdaq, 26.96%, 65.49M, 101.74M
WKHS, Workhorse Group Inc, Nasdaq, 26.37%, 156.45M, 163.51M
FUBO, Fubotv Inc, NYSE, 26.31%, 165.24M, 185.08M
MVIS, Microvision, Inc., Nasdaq, 26.14%, 163.70M, 165.21M
OTRK, Ontrak, Inc., Nasdaq, 25.78%, 11.65M, 20.86M
RAD, Rite Aid Corporation, NYSE, 25.69%, 53.95M, 55.65M
SPCE, Virgin Galactic Holdings Inc, NYSE, 25.57%, 207.56M, 258.59M
SKLZ, Skillz Inc, NYSE, 25.30%, 240.45M, 340.81M
DNMR, Danimer Scientific Inc, NYSE, 24.85%, 87.56M, 101.11M
PETS, Petmed Express Inc, Nasdaq, 24.37%, 19.70M, 20.98M
SRG, Seritage Growth Properties, NYSE, 24.03%, 34.67M, 43.68M
SWTX, SpringWorks Therapeutics Inc, Nasdaq, 23.50%, 31.66M, 49.41M
REI, Ring Energy Inc, AMEX, 22.77%, 73.81M, 106.70M
RMO, Romeo Power Inc, NYSE, 22.24%, 138.94M, 151.23M
ZYXI, Zynex Inc., Nasdaq, 21.90%, 22.55M, 39.05M
CVM, CEL-SCI Corporation, AMEX, 21.80%, 41.64M, 43.33M
OMER, Omeros Corporation, Nasdaq, 21.76%, 60.14M, 62.73M
GDRX, Goodrx Holdings Inc, Nasdaq, 21.71%, 65.61M, 82.73M
SDC, SmileDirectClub Inc, Nasdaq, 21.63%, 111.55M, 120.70M
HYZN, Hyzon Motors Inc, Nasdaq, 21.08%, 76.33M, 247.90M
SENS, Senseonics Holdings Inc, AMEX, 20.96%, 373.28M, 463.26M
KPTI, Karyopharm Therapeutics Inc, Nasdaq, 20.94%, 69.08M, 79.42M
CNK, Cinemark Holdings, Inc., NYSE, 20.94%, 107.11M, 120.45M
IGMS, IGM Biosciences Inc, Nasdaq, 20.90%, 15.73M, 26.08M
BKKT, Bakkt Holdings Inc, NYSE, 20.77%, 50.67M, 75.27M
  • Related