Home > Back-end >  Python Search multiple html files in a variable
Python Search multiple html files in a variable

Time:03-16

I have used Selenium driver to crawl through many site pages. Every time I get a new page I append the html to a variable called "All_APP_Pages". The variable All_APP_Pages is a variable holding html for many pages. Did not post code because its long and no relevant to issue. Python list "All_APP_Pages" as being of type bytes.

from lxml import html
from lxml import etree 
import xml.etree.ElementTree as ET
from selenium.webdriver.common.by import By

dom = etree.HTML(All_APP_Pages)
xp = "//tr[.//span[contains(.,'Product Data Solutions (UHC MR)')] and .//td[contains(.,'SQLServer')] and .//td[contains(.,'MR')]]//a"
link = dom.xpath(xp)

print(link)

Once all pages have been scanned I need to get the link from this xpath

"//tr[.//span[contains(.,'Product Data Solutions (ABC MR)')] and .//td[contains(.,'SQLServer')] and .//td[contains(.,'MR')]]//a"

The xpath listed here works. However it only works with the selenium driver if driver is on the page where this link exists. That is why all page are in one variable since I dont know what page the link will be on. The print shows this result

[<Element a at 0x1c39dea1180>]

How do I get this value from link I so can check if value is correct?

CodePudding user response:

You need to iterate the list and get the href value

dom = etree.HTML(All_APP_Pages)
xp = "//tr[.//span[contains(.,'Product Data Solutions (UHC MR)')] and .//td[contains(.,'SQLServer')] and .//td[contains(.,'MR')]]//a"
link = dom.xpath(xp)
hrefs=[l.attrib["href"] for l in link]
print(hrefs)
  • Related