I would like to scrape a website in Python. The class_ name is too long for Pycharm (>120 characters), so I defined a variable to split it up. However, it still doesn't work. It only returns "None". What am I doing wrong?
html = requests.get("https://www.foxestalk.co.uk/topic/127651-youri-tielemans/page/122/#comments").text
soup = BeautifulSoup(html, "lxml")
test = "cPost ipsBox ipsResponsive_pull ipsComment ipsComment_parent ipsClearfix "
test2 = test "ipsClear ipsColumns ipsColumns_noSpacing ipsColumns_collapsePhone "
comment = soup.find("article", class_=test2)
CodePudding user response:
You need to fix your spacing between class items:
html = requests.get("https://www.foxestalk.co.uk/topic/127651-youri-tielemans/page/122/#comments").text
soup = BeautifulSoup(html, "lxml")
test = "cPost ipsBox ipsResponsive_pull ipsComment ipsComment_parent ipsClearfix "
test2 = test "ipsClear ipsColumns ipsColumns_noSpacing ipsColumns_collapsePhone"
comment = soup.find_all("article", {'class':test2})
CodePudding user response:
As mentioned by @1extraline you should fix your spaces / typos to get your goal.
I would also recommend avoiding to select your elements by classes, they are more often generated dynamically and it is not necessary to use all of them.
So change your strategy and select by more static attributes like id
or by structure like tag
.
In your specific case simply use css selectors
to shorten your selection:
soup.select('article')
or a bit more specific:
soup.select('article[id^="elComment"]')
Example
import requests
from bs4 import BeautifulSoup
html = requests.get("https://www.foxestalk.co.uk/topic/127651-youri-tielemans/page/122/#comments").text
soup = BeautifulSoup(html, "lxml")
print(len(soup.select('article')))
data = []
for e in soup.select('article[id^="elComment"]'):
data.append({
'username':e.h3.text.strip(),
'postCount': e.select_one('ul ul li').text.strip(),
'whatever': 'you like to scrape'
})
data
Output
[{'username': 'foxfanazer',
'postCount': '31,325',
'whatever': 'you like to scrape'},
{'username': 'CrispinLA in Texas',
'postCount': '2,197',
'whatever': 'you like to scrape'},
{'username': 'happy85',
'postCount': '1,077',
'whatever': 'you like to scrape'},
{'username': 'CrispinLA in Texas',
'postCount': '2,197',
'whatever': 'you like to scrape'},
{'username': "Sharpe's Fox",
'postCount': '7,867',
'whatever': 'you like to scrape'},
{'username': 'foxfanazer',
'postCount': '31,325',
'whatever': 'you like to scrape'},
{'username': "Sharpe's Fox",
'postCount': '7,867',
'whatever': 'you like to scrape'},
{'username': 'cropstonfox',
'postCount': '825',
'whatever': 'you like to scrape'},...]