I am trying to scrape the following bullet points on this website. Would greatly appreciate help with the solution.
Website: https://underdognetwork.com/basketball/nba-news-and-fantasy-basketball-notes-6-10
For example I would like to only scrape the following bullet points:
Stephen Curry (foot) — In
Robert Williams (knee) — Available
Otto Porter (foot) — In
Robert Williams (knee) — Available
Stephen Curry (foot) — In
Otto Porter (foot) — In
Andre Iguodala (knee) — Available
James Wiseman (knee) — Injured
Thank you!
CodePudding user response:
Try as follows:
import requests
from bs4 import BeautifulSoup
url = 'https://underdognetwork.com/basketball/nba-news-and-fantasy-basketball-notes-6-10'
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'lxml')
info = [p.get_text() for p in soup.select('li > p')]
print(info)
['Stephen Curry (foot) — In',
'Robert Williams (knee) — Available',
'Otto Porter (foot) — In',
'Robert Williams (knee) — Available',
'Stephen Curry (foot) — In',
'Otto Porter (foot) — In',
'Andre Iguodala (knee) — Available',
'James Wiseman (knee) — Injured']
To get rid of the duplicates, use list(set(info))
. If you want to preserve order, have a look at the answer here.