Home > database >  Webdata Scraping Html Span Variable Python
Webdata Scraping Html Span Variable Python

Time:10-03

Iam already able to successfully login:

#!/usr/bin/python3-10-6
import mechanize from bs4 
import BeautifulSoup 
import urllib.request as urllib2 
import http.cookiejar as cookielib
#import cookielib ## http.cookiejar in python3

cj = cookielib.CookieJar() 
br = mechanize.Browser() 
br.set_cookiejar(cj) 
br.open("https://www.sunnyportal.com/Templates/Start.aspx?logout=true")

br.select_form(nr=0) 
br.form['ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName'] = '[email protected]' 
br.form['ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword'] = 'XXXXXXXX' 
br.submit()

url = 'https://www.sunnyportal.com/Templates/Start.aspx?logout=false'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
desired_data = soup.select("span", {"class":'.mainValueUnit'})
for x in desired_data:
    print('TEXT VALUE:', x.get_text(strip=True), '|', 'DATA_PEAK:', x.get('data-peak'))

Now I would like to scrap an the span tag (photovoltaic power)

<div  data-name="pvPower">
    <div >Aktuelle PV-Leistung</div>
    <div ></div>
    <div >
        <div >
            <img  src="/Images/Dashboard/gauge.png" alt="" />
            <img  src="/Images/Dashboard/currentPlantPowerPointer.png" alt="" />
            <span
                 
                data-peak="4920"
                data-value="300"
                data-timestamp="2022-10-02T09:15:00">-</span>
            <span ></span>
        </div>
    </div>
    <div >
        <a id="ctl00_ContentPlaceHolder1_UserControlShowDashboard1_currentplantPowerWidget_FooterLink" href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolder1$UserControlShowDashboard1$currentplantPowerWidget$FooterLink&#39;,&#39;&#39;)">Energie und Leistung »</a>
    </div>
</div>

How to scrap and print the span value? Its a dynamic variable for photovoltaic_power, examined:

<span  data-peak="4920" data-value="908" data-timestamp="2022-10-02T13:00:00">1177</span>

"1177" number is the dynamic number iam looking for. Thank you

CodePudding user response:

Assuming your HTML is correct, and it's not loaded/enriched dynamically with Javascript, this is how you can get that data:

from bs4 import BeautifulSoup as bs

html = '''
<div  data-name="pvPower">
    <div >Aktuelle PV-Leistung</div>
    <div ></div>
    <div >
        <div >
            <img  src="/Images/Dashboard/gauge.png" alt="" />
            <img  src="/Images/Dashboard/currentPlantPowerPointer.png" alt="" />
            <span
                 
                data-peak="4920"
                data-value="300"
                data-timestamp="2022-10-02T09:15:00">-</span>
            <span ></span>
        </div>
    </div>
    <div >
        <a id="ctl00_ContentPlaceHolder1_UserControlShowDashboard1_currentplantPowerWidget_FooterLink" href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolder1$UserControlShowDashboard1$currentplantPowerWidget$FooterLink&#39;,&#39;&#39;)">Energie und Leistung »</a>
    </div>
</div>
'''
soup = bs(html, 'html.parser')

desired_data = soup.select('.mainValueAmount')
for x in desired_data:
    print('TEXT VALUE:', x.get_text(strip=True), '|', 'DATA_PEAK:', x.get('data-peak'))

Result in terminal:

TEXT VALUE: - | DATA_PEAK: 4920

For BeautifulSoup documentation, visit https://beautiful-soup-4.readthedocs.io/en/latest/index.html

  • Related