I am trying to pull info from this site which is a table with different "flags" where each one has info, such as date, number, defect, comment, workcenter and so on. I need to get the info from each flag.
So far I was able to isolate the HTML table block that has all the flags, and extract them. Now I need a way to get to the info itself in a logically ordered way, so that I can work with it further on, filtering out specific info and such.
Basically:
data = driver.find_element_by_xpath('//[@id="ctl00_Content_pnlGridView"]').find_elements_by_tag_name("tr")
returns a huge list with all the elemens, such as:
<selenium.webdriver.remote.webelement.WebElement (session="2a54d4afd1355859872becefd5f06f3a")
each of these contain the data I need (date, workcenter, comment, etc.).
How can I turn this list of elements into text? Adding .text()
at the end doesn't work, since its not a single element.
Using soup.GetText()
also doesn't work.
I've tried to iterate through the list with a for
loop, and turn each element into text, but I don't know how to separate specific pieces of data from the output
CodePudding user response:
.text
won't work, since that is for single web element
.
You are using find_elements
which will return a list in python.
So basically you've to iterate
that list
.
Something like this :
data = driver.find_element_by_xpath('//[@id="ctl00_Content_pnlGridView"]').find_elements_by_tag_name("tr")
for d in data:
print(d.text)
CodePudding user response:
Well I've solved it. Might not be very elegant, but it works.
driver.get("https://comit.app.pmi/report/default.aspx")
flag_text = []
data = driver.find_element_by_xpath('//*[@id="ctl00_Content_pnlGridView"]')
flag_line = data.find_elements_by_tag_name("tr")
for flag in flag_line:
flag_text.append(flag.text)
PA21_FLAGS = []
MA21_FLAGS = []
PA22_FLAGS = []
MA22_FLAGS = []
for flag in flag_text:
if "PA21" in flag:
PA21_FLAGS.append(flag)
elif "MA21" in flag:
MA21_FLAGS.append(flag)
elif "PA22" in flag:
PA22_FLAGS.append(flag)
elif "MA22" in flag:
MA22_FLAGS.append(flag)
today = datetime.now()
new_date = today.strftime("%d/%m/%Y")
for flag in PA21_FLAGS:
if new_date in flag:
print(flag)
for flag in MA21_FLAGS:
if new_date in flag:
print(flag)
for flag in PA22_FLAGS:
if new_date in flag:
print(flag)
for flag in MA22_FLAGS:
if new_date in flag:
print(flag)
Now I just need to figure out how to pull specific data from it