NoneType' object has no attribute 'find_all' error coming-CodePudding

I was web scrapping a Wikipedia table using Beautiful Soup this is my code

Code

URL="https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos"  
page=requests.get(URL)    
soup1=BeautifulSoup(page.text,'lxml')   
table = soup1.find('table',{'class':'wikitable sortable jquery-tablesorter'})    

headers=[]
for i in table.find_all('tr'):    
    title=i.text.strip()    
    headers.append(title)

I got Error as

AttributeError: 'NoneType' object has no attribute 'find_all'

I tried using htmlparser and get_text function also still getting same error even same error for th also.

CodePudding user response：

You can do that using only pandas

import pandas as pd
 
table = pd.read_html("https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos",attrs={'class':'wikitable sortable'})[0] 
print(table)

CodePudding user response：

How to find headers:

Go to chrome developer mode and reload your website and go to main url and find header tab where all the data related URL will be listed

Using bs4 you have to add user-agent to get data from HTML and then and table tag return all the values

import requests
from bs4 import BeautifulSoup
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"}
response = requests.get('https://en.wikipedia.org/wiki/List_of_most-viewed_YouTube_videos',headers=headers)

soup = BeautifulSoup(response.text,'html.parser')
table = soup.find('table',attrs={"class":"wikitable sortable"})
headers=[]
for i in table.find_all('tr'):    
    title=i.text.strip()    
    headers.append(title)

Here is the implementation using pandas:

import pandas as pd

data=pd.read_html(response.text,attrs={"class":"wikitable sortable"})