Home > database >  How can I get an attribute out of a tag?
How can I get an attribute out of a tag?

Time:09-20

I am using Beautiful Soup to extract specific data out of a webpage. I tried to get an attribute of a specific tag, but I failed. I have to extract the attribute 'title' from the tag.

Here is the specific html tag I tried to get the attribute from:

<span id="currwx_icon" style="display: block;"  title="Cloudy"></span>

Here are the codes I ran:

import requests
from bs4 import BeautifulSoup

data = requests.get('https://worldweather.wmo.int/en/city.html?cityId=206')
data = BeautifulSoup(data.text, 'html.parser')

today_weather = data.select('#currwx_icon')
for info in today_weather:
    print(info['title'])

In this case, I get an error

KeyError: 'title'

import requests
from bs4 import BeautifulSoup

data = requests.get('https://worldweather.wmo.int/en/city.html?cityId=206')
data = BeautifulSoup(data.text, 'html.parser')

today_weather = data.select('#currwx_icon')
for info in today_weather:
    print(info.attrs)

In this case, it does return a list of attributes but the attribute title is omitted

{'id': 'currwx_icon', 'class': ['weather_icon1'], 'style': 'display: block;'}

This is the webpage I am trying to extract data from: https://worldweather.wmo.int/en/city.html?cityId=206

What am I missing here??? And thank you in advance!

CodePudding user response:

Content is served dynamically based on data from additional XHR request and browser / js is manipulating the DOM. Try to use this structured data instead, you will find a lot of information via:

Example

import requests
data = requests.get('https://worldweather.wmo.int/en/json/present.xml').json()['present']
for k in data:
    if data[k]['cityId'] == 206:
        print(data[k])

Output

{'cityId': 206, 'stnId': '27612', 'stnName': 'MOSKVA VDNH', 'issue': '202209200900', 'temp': 12, 'rh': 78, 'wxdesc': 'Cloudy', 'wxImageCode': '23', 'wd': 'S', 'ws': '1.0', 'iconNum': '2301', 'sundate': '20220920', 'sunrise': '06:10', 'sunset': '18:34', 'moonrise': '', 'moonset': '', 'daynightcode': 'a'}
  • Related