I was web scrapping a Wikipedia table using Beautiful Soup this is my code
Code
URL="https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos"
page=requests.get(URL)
soup1=BeautifulSoup(page.text,'lxml')
table = soup1.find('table',{'class':'wikitable sortable jquery-tablesorter'})
headers=[]
for i in table.find_all('tr'):
title=i.text.strip()
headers.append(title)
I got Error as
AttributeError: 'NoneType' object has no attribute 'find_all'
I tried using htmlparser
and get_text
function also still getting same error even same error for th
also.
CodePudding user response:
You can do that using only pandas
import pandas as pd
table = pd.read_html("https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos",attrs={'class':'wikitable sortable'})[0]
print(table)
CodePudding user response:
How to find headers:
Go to chrome developer mode and reload your website and go to main url and find header tab where all the data related URL will be listed
Using bs4
you have to add user-agent
to get data from HTML and then and table tag return all the values
import requests
from bs4 import BeautifulSoup
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"}
response = requests.get('https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos',headers=headers)
soup = BeautifulSoup(response.text,'html.parser')
table = soup.find('table',attrs={"class":"wikitable sortable"})
headers=[]
for i in table.find_all('tr'):
title=i.text.strip()
headers.append(title)
Here is the implementation using pandas
:
import pandas as pd
data=pd.read_html(response.text,attrs={"class":"wikitable sortable"})