Beautiful Soup Scraping returns empty brackets-CodePudding

html = 'https://en.wikipedia.org/wiki/List_of_largest_banks'

html_data = requests.get('https://en.wikipedia.org/wiki/List_of_largest_banks')

html_data_text = html_data.text

soup = bs(html_data_text, 'html.parser')

table = soup.find_all('table', {id : "By_market_capitalization"})

print(table)

returns empty bracket. I have to only use BS for this assignment. I've seen other libraries help but i can't use them. Any idea whats going wrong with trying to get this table?

CodePudding user response：

You're looking for a table with that id. However, the links shows that ID in a span.

Change the first selector to find a span, then use findNext() to get the table.

There you can find the tr and td and print the .text:

import json
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests


html = 'https://en.wikipedia.org/wiki/List_of_largest_banks'

html_data = requests.get(html)

html_data_text = html_data.text

soup = BeautifulSoup(html_data_text, 'html.parser')

span = soup.find('span', id="By_market_capitalization")
table = span.findNext('table')

for row in table.findAll('tr'):
    tds = row.findAll('td')
    if len(tds) > 1:
        print(tds[1].text.strip())

JPMorgan Chase
Industrial and Commercial Bank of China
Bank of America
Wells Fargo
China Construction Bank
Agricultural Bank of China
HSBC Holdings PLC
Citigroup Inc.
... more

CodePudding user response：

@David,

I have tried using the code and it needed some changes.

Note:- You have tried using the id attribute which is not an attribute present in the tables

Here is the code that after fixing:-

import requests
from bs4 import BeautifulSoup

req = requests.get('https://en.wikipedia.org/wiki/List_of_largest_banks')
html = req.text
soup = BeautifulSoup(html, 'html.parser')
table = soup.find_all('table', {id : "By_market_capitalization"})
print(table)