Iam already able to successfully login:
#!/usr/bin/python3-10-6
import mechanize from bs4
import BeautifulSoup
import urllib.request as urllib2
import http.cookiejar as cookielib
#import cookielib ## http.cookiejar in python3
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.open("https://www.sunnyportal.com/Templates/Start.aspx?logout=true")
br.select_form(nr=0)
br.form['ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName'] = '[email protected]'
br.form['ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword'] = 'XXXXXXXX'
br.submit()
url = 'https://www.sunnyportal.com/Templates/Start.aspx?logout=false'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
desired_data = soup.select("span", {"class":'.mainValueUnit'})
for x in desired_data:
print('TEXT VALUE:', x.get_text(strip=True), '|', 'DATA_PEAK:', x.get('data-peak'))
Now I would like to scrap an the span tag (photovoltaic power)
<div data-name="pvPower">
<div >Aktuelle PV-Leistung</div>
<div ></div>
<div >
<div >
<img src="/Images/Dashboard/gauge.png" alt="" />
<img src="/Images/Dashboard/currentPlantPowerPointer.png" alt="" />
<span
data-peak="4920"
data-value="300"
data-timestamp="2022-10-02T09:15:00">-</span>
<span ></span>
</div>
</div>
<div >
<a id="ctl00_ContentPlaceHolder1_UserControlShowDashboard1_currentplantPowerWidget_FooterLink" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$UserControlShowDashboard1$currentplantPowerWidget$FooterLink','')">Energie und Leistung »</a>
</div>
</div>
How to scrap and print the span value? Its a dynamic variable for photovoltaic_power, examined:
<span data-peak="4920" data-value="908" data-timestamp="2022-10-02T13:00:00">1177</span>
"1177" number is the dynamic number iam looking for. Thank you
CodePudding user response:
Assuming your HTML is correct, and it's not loaded/enriched dynamically with Javascript, this is how you can get that data:
from bs4 import BeautifulSoup as bs
html = '''
<div data-name="pvPower">
<div >Aktuelle PV-Leistung</div>
<div ></div>
<div >
<div >
<img src="/Images/Dashboard/gauge.png" alt="" />
<img src="/Images/Dashboard/currentPlantPowerPointer.png" alt="" />
<span
data-peak="4920"
data-value="300"
data-timestamp="2022-10-02T09:15:00">-</span>
<span ></span>
</div>
</div>
<div >
<a id="ctl00_ContentPlaceHolder1_UserControlShowDashboard1_currentplantPowerWidget_FooterLink" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$UserControlShowDashboard1$currentplantPowerWidget$FooterLink','')">Energie und Leistung »</a>
</div>
</div>
'''
soup = bs(html, 'html.parser')
desired_data = soup.select('.mainValueAmount')
for x in desired_data:
print('TEXT VALUE:', x.get_text(strip=True), '|', 'DATA_PEAK:', x.get('data-peak'))
Result in terminal:
TEXT VALUE: - | DATA_PEAK: 4920
For BeautifulSoup documentation, visit https://beautiful-soup-4.readthedocs.io/en/latest/index.html