My project involves me scraping random words from https://www.randomlists.com/random-vocabulary-words and creating a multiple choice question using the words scraped off the website.
The data I need to scrape is in a <div>
with , specifically the <ol>
tag under it. I need to obtain the words in <li>
tags under <ol>
after doing this. I've attached an image of this. Image
Currently, my code is as follows:
url = https://www.randomlists.com/random-vocabulary-words
r = requests.get(url)
html_content = r.content
soup = bs4.BeautifulSoup(html_content, 'html.parser')
result = []
for li in soup.find('ol', class_='rand_large').find_all('li'):
result.append(list(li.stripped_strings))
print(result)
Now, I don't know how to scrape the content in the <ol>
tags, or if that is the content that I need to scrape in the first place, in order to get the randomized words(and their meanings, also in <li>
tags).
Actually, when the code ran, it didn't display any output. Instead, it threw an
error (AttributeError: 'NoneType' object has no attribute 'find_all')
CodePudding user response:
What happens?
Content is served dynamically by website, so you wont find the elements your searching for in your soup
.
So following error appears, cause soup.find('ol', class_='rand_large')
do not find the element in the response and thats why also your find_all()
fails:
AttributeError: 'NoneType' object has no attribute 'find_all'
How to fix?
use solutions that will render the html like selenium
use an api that will provide the information (https://www.randomlists.com/data/vocabulary-words.json)
The following line will give you a random set of 3 dicts with word and detail:
random.choices(r.json()['data'], k=3)
Example
import requests, random
url = 'https://www.randomlists.com/data/vocabulary-words.json'
r = requests.get(url)
random.choices(r.json()['data'], k=3)
Output
[{'name': 'attenuate', 'detail': 'make thin. weaken enervate'},
{'name': 'savant', 'detail': 'person of great learning'},
{'name': 'fledged', 'detail': 'able to fly trained experienced'}]