As an example I have code like this:
import requests
from bs4 import BeautifulSoup
def get_data(url):
r = requests.get(url).text
soup = BeautifulSoup(r, 'html.parser')
word = soup.find(class_='mdl-cell mdl-cell--11-col')
print(word)
get_data('http://savodxon.uz/izoh?sher')
I don't know why, but when I print the word there will be nothing
Like this:
<h2 id="definition_l_title"></h2>
But should be like this:
<h2 id="definition_l_title" >acha</h2>
CodePudding user response:
You have common problem with modern pages: this page uses JavaScript
to add/update elements but BeautifulSoup
/lxml
, requests
/urllib
can't run JavaScript
.
You may need Selenium to control real web browser which can run JS
. OR use (manually) DevTools
in Firefox
/Chrome
(tab Network
) to see if JavaScript
reads data from some URL. And try to use this URL with requests
. JS
usually gets JSON
which can be easy converted to Python dictionary (without BS
). You can also check if page has (free) API
for programmers.
Using DevTools
I found it read data from other URLs (using post
)
http://savodxon.uz/api/get_definition
and they give results as JSON data so it doesn't need beautifulsoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
'X-Requested-With': 'XMLHttpRequest',
}
# ---- suggestions ---
url = 'http://savodxon.uz/api/search'
payload = {
'keyword': 'sher',
'names': '[object HTMLInputElement]',
}
response = requests.post(url, data=payload, headers=headers)
data = response.json()
#print(data)
# ---
print('--- suggestions ---')
for word in data['suggestions']:
print('-', word)
# --- definitons ---
url = 'http://savodxon.uz/api/get_definition'
payload = {
'word': 'sher',
}
response = requests.post(url, data=payload, headers=headers)
data = response.json()
#print(data.keys())
print('--- definitons ---')
for item in data['definition']:
for meaning in item['meanings']:
print(meaning['text'])
for example in meaning['examples']:
print('-', example['text'], f"({example['takenFrom']})")
Result:
--- suggestions ---
- sher
- sherboz
- sherdil
- sherik
- sherikchilik
- sheriklashmoq
- sheriklik
- sherlanmoq
- sherobodlik
- sherolgʻin
- sheroz
- sheroza
- sherqadamlik
- shershikorlik
- sherst
--- definitons ---
Mushuksimonlar oilasiga mansub, kalta va sargʻish yungli (erkaklari esa qalin yolli) yirik sutemizuvchi yirtqich hayvon; arslon.
- Ovchining zoʻri sher otadi, Dehqonning zoʻri yer ochadi. (Maqol)
- Oʻzingni er bilsang, oʻzgani sher bil. (Maqol)
- Bular [uch ogʻayni botirlar] tushgan toʻqayning narigi tomonida bir sherning makoni bor edi. (Ertaklar)
Shaxsni sherga nisbatlab ataydi (“azamat“, “botir“ polvon maʼnosida).
- Bu hujjatni butun rayonga tarqatmoqchimiz, sher, obroʻying oshib, choʻqqiga koʻtarilayotganingni bilasanmi? (I. Rahim, Ixlos)
- — Balli, sher, xatni qoʻlingizdan kim oldi? — Bir chol. (A. Qodiriy, Oʻtgan kunlar)
- Yoppa yov-lik otga mining, sherlarim. (Yusuf va Ahmad)
- Figʻon qilgan bunda sherlar, Yoʻlbars, qoplon, bunda erlar (Bahrom va Gulandom)
BTW:
You may also run it without headers.
Here is example video (without sound) how to use DevTools
How to use DevTools in Firefox to find JSON data in EpicGames.com - YouTube
CodePudding user response:
The data you see on the page is loaded via JavaScript from external URL so beautifulsoup
cannot see it. To load the data you can use requests
module:
import requests
api_url = "https://savodxon.uz/api/get_definition"
data = requests.post(api_url, data={"word": "sher"}).json()
print(data)
Prints:
{
"core": "",
"definition": [
{
"meanings": [
{
"examples": [
{
"takenFrom": "Maqol",
"text": "Ovchining zoʻri sher otadi, Dehqonning zoʻri yer ochadi.",
},
{
"takenFrom": "Maqol",
"text": "Oʻzingni er bilsang, oʻzgani sher bil.",
},
{
"takenFrom": "Ertaklar",
"text": "Bular [uch ogʻayni botirlar] tushgan toʻqayning narigi tomonida bir sherning makoni bor edi.",
},
],
"reference": "",
"tags": "",
"text": "Mushuksimonlar oilasiga mansub, kalta va sargʻish yungli (erkaklari esa qalin yolli) yirik sutemizuvchi yirtqich hayvon; arslon.",
},
{
"examples": [
{
"takenFrom": "I. Rahim, Ixlos",
"text": "Bu hujjatni butun rayonga tarqatmoqchimiz, sher, obroʻying oshib, choʻqqiga koʻtarilayotganingni bilasanmi?",
},
{
"takenFrom": "A. Qodiriy, Oʻtgan kunlar",
"text": "— Balli, sher, xatni qoʻlingizdan kim oldi? — Bir chol.",
},
{
"takenFrom": "Yusuf va Ahmad",
"text": "Yoppa yov-lik otga mining, sherlarim.",
},
{
"takenFrom": "Bahrom va Gulandom",
"text": "Figʻon qilgan bunda sherlar, Yoʻlbars, qoplon, bunda erlar",
},
],
"reference": "",
"tags": "koʻchma",
"text": "Shaxsni sherga nisbatlab ataydi (“azamat“, “botir“ polvon maʼnosida).",
},
],
"phrases": [
{
"meanings": [
{
"examples": [
{
"takenFrom": "Gazetadan",
"text": "Ichkilikning zoʻridan sher boʻlib ketgan Yazturdi endi koʻcha harakati qoidasini unutib qoʻygan edi.",
},
{
"takenFrom": "H. Tursunqulov, Hayotim qissasi",
"text": "Balli, azamat, bugun jang vaqtida sher boʻlib ketding.",
},
],
"reference": "",
"tags": "ayn.",
"text": "Sherlanmoq.",
}
],
"tags": "",
"text": "Sher boʻlmoq",
}
],
"tags": "",
}
],
"isDerivative": False,
"tailStructure": "",
"type": "ot",
"wordExists": True,
}
EDIT: To get words:
import requests
api_url = "https://savodxon.uz/api/search"
d = {"keyword": "sher", "names": "[object HTMLInputElement]"}
data = requests.post(api_url, data=d).json()
print(data)
Prints:
{
"success": True,
"matchFound": True,
"suggestions": [
"sher",
"sherboz",
"sherdil",
"sherik",
"sherikchilik",
"sheriklashmoq",
"sheriklik",
"sherlanmoq",
"sherobodlik",
"sherolgʻin",
"sheroz",
"sheroza",
"sherqadamlik",
"shershikorlik",
"sherst",
],
}