Home > Software engineering >  Trying to scrape a specific table but getting no results
Trying to scrape a specific table but getting no results

Time:01-08

I tried three different techniques to scrape a table named 'table-light', but nothing is actually working for me. The code below shows my attempts to extract the data.

import pandas as pd
tables = pd.read_html('https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap')
tables


############################################################################


import requests
import pandas as pd
url = 'https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[10]
print(df)


############################################################################


import requests
from bs4 import BeautifulSoup
url = "https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap"
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, "html.parser")
table = soup.find_all('table-light')
print(table)

The table that I am trying to extract data from is named 'table-light'. I want to get all the columns and all 144 rows. How can I do that?

CodePudding user response:

You can try to set User-Agent header to get the correct HTML (and not captcha page):

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "lxml") # <-- don't use html.parser here

table = soup.select_one(".table-light")
for td in table.tr.select('td'):
    td.name = 'th'

df = pd.read_html(str(table))[0]
print(df.head())

Prints:

   No.                            Name Market Cap    P/E  Fwd P/E   PEG   P/S   P/B    P/C  P/FCF EPS past 5Y EPS next 5Y Sales past 5Y Change   Volume
0    1       Real Estate - Development      3.14B   3.21    21.12  0.24  0.60  0.52   2.28  17.11      43.30%      13.42%        13.69%  1.43%  715.95K
1    2           Textile Manufacturing      3.42B  32.58    25.04     -  1.43  2.58   9.88  90.16      15.31%      -0.49%         3.54%  1.83%  212.71K
2    3                     Coking Coal      5.31B   2.50     4.93  0.37  0.64  1.53   4.20   2.54      38.39%       6.67%        22.92%  5.43%    1.92M
3    4       Real Estate - Diversified      6.71B  17.38   278.89  0.87  2.78  1.51  15.09  91.64       0.48%      19.91%        11.97%  3.31%  461.33K
4    5  Other Precious Metals & Mining      8.10B  24.91    29.07  2.71  6.52  1.06  14.47  97.98      16.30%       9.19%        20.71%  0.23%    4.77M
  • Related