html = 'https://en.wikipedia.org/wiki/List_of_largest_banks'
html_data = requests.get('https://en.wikipedia.org/wiki/List_of_largest_banks')
html_data_text = html_data.text
soup = bs(html_data_text, 'html.parser')
table = soup.find_all('table', {id : "By_market_capitalization"})
print(table)
returns empty bracket. I have to only use BS for this assignment. I've seen other libraries help but i can't use them. Any idea whats going wrong with trying to get this table?
CodePudding user response:
You're looking for a table with that id. However, the links shows that ID in a span.
Change the first selector to find a span, then use findNext()
to get the table.
There you can find the tr
and td
and print the .text
:
import json
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
html = 'https://en.wikipedia.org/wiki/List_of_largest_banks'
html_data = requests.get(html)
html_data_text = html_data.text
soup = BeautifulSoup(html_data_text, 'html.parser')
span = soup.find('span', id="By_market_capitalization")
table = span.findNext('table')
for row in table.findAll('tr'):
tds = row.findAll('td')
if len(tds) > 1:
print(tds[1].text.strip())
JPMorgan Chase
Industrial and Commercial Bank of China
Bank of America
Wells Fargo
China Construction Bank
Agricultural Bank of China
HSBC Holdings PLC
Citigroup Inc.
... more
CodePudding user response:
@David,
I have tried using the code and it needed some changes.
Note:- You have tried using the id attribute which is not an attribute present in the tables
Here is the code that after fixing:-
import requests
from bs4 import BeautifulSoup
req = requests.get('https://en.wikipedia.org/wiki/List_of_largest_banks')
html = req.text
soup = BeautifulSoup(html, 'html.parser')
table = soup.find_all('table', {id : "By_market_capitalization"})
print(table)