I am using Beautiful Soup to extract specific data out of a webpage.
I tried to get an attribute of a specific tag, but I failed.
I have to extract the attribute 'title'
from the tag.
Here is the specific html tag I tried to get the attribute from:
<span id="currwx_icon" style="display: block;" title="Cloudy"></span>
Here are the codes I ran:
import requests
from bs4 import BeautifulSoup
data = requests.get('https://worldweather.wmo.int/en/city.html?cityId=206')
data = BeautifulSoup(data.text, 'html.parser')
today_weather = data.select('#currwx_icon')
for info in today_weather:
print(info['title'])
In this case, I get an error
KeyError: 'title'
import requests
from bs4 import BeautifulSoup
data = requests.get('https://worldweather.wmo.int/en/city.html?cityId=206')
data = BeautifulSoup(data.text, 'html.parser')
today_weather = data.select('#currwx_icon')
for info in today_weather:
print(info.attrs)
In this case, it does return a list of attributes but the attribute title
is omitted
{'id': 'currwx_icon', 'class': ['weather_icon1'], 'style': 'display: block;'}
This is the webpage I am trying to extract data from: https://worldweather.wmo.int/en/city.html?cityId=206
What am I missing here??? And thank you in advance!
CodePudding user response:
Content is served dynamically based on data from additional XHR request and browser / js is manipulating the DOM. Try to use this structured data instead, you will find a lot of information via:
- present data: https://worldweather.wmo.int/en/json/present.xml
- forecast data (additional): https://worldweather.wmo.int/en/json/206_en.xml
Example
import requests
data = requests.get('https://worldweather.wmo.int/en/json/present.xml').json()['present']
for k in data:
if data[k]['cityId'] == 206:
print(data[k])
Output
{'cityId': 206, 'stnId': '27612', 'stnName': 'MOSKVA VDNH', 'issue': '202209200900', 'temp': 12, 'rh': 78, 'wxdesc': 'Cloudy', 'wxImageCode': '23', 'wd': 'S', 'ws': '1.0', 'iconNum': '2301', 'sundate': '20220920', 'sunrise': '06:10', 'sunset': '18:34', 'moonrise': '', 'moonset': '', 'daynightcode': 'a'}