I tried three different techniques to scrape a table named 'table-light', but nothing is actually working for me. The code below shows my attempts to extract the data.
import pandas as pd
tables = pd.read_html('https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap')
tables
############################################################################
import requests
import pandas as pd
url = 'https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[10]
print(df)
############################################################################
import requests
from bs4 import BeautifulSoup
url = "https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap"
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, "html.parser")
table = soup.find_all('table-light')
print(table)
The table that I am trying to extract data from is named 'table-light'. I want to get all the columns and all 144 rows. How can I do that?
CodePudding user response:
You can try to set User-Agent
header to get the correct HTML (and not captcha page):
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://finviz.com/groups.ashx?g=industry&v=120&o=marketcap"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "lxml") # <-- don't use html.parser here
table = soup.select_one(".table-light")
for td in table.tr.select('td'):
td.name = 'th'
df = pd.read_html(str(table))[0]
print(df.head())
Prints:
No. Name Market Cap P/E Fwd P/E PEG P/S P/B P/C P/FCF EPS past 5Y EPS next 5Y Sales past 5Y Change Volume
0 1 Real Estate - Development 3.14B 3.21 21.12 0.24 0.60 0.52 2.28 17.11 43.30% 13.42% 13.69% 1.43% 715.95K
1 2 Textile Manufacturing 3.42B 32.58 25.04 - 1.43 2.58 9.88 90.16 15.31% -0.49% 3.54% 1.83% 212.71K
2 3 Coking Coal 5.31B 2.50 4.93 0.37 0.64 1.53 4.20 2.54 38.39% 6.67% 22.92% 5.43% 1.92M
3 4 Real Estate - Diversified 6.71B 17.38 278.89 0.87 2.78 1.51 15.09 91.64 0.48% 19.91% 11.97% 3.31% 461.33K
4 5 Other Precious Metals & Mining 8.10B 24.91 29.07 2.71 6.52 1.06 14.47 97.98 16.30% 9.19% 20.71% 0.23% 4.77M