Home > Back-end >  HTML filter and turn to text [closed]
HTML filter and turn to text [closed]

Time:09-17

I am trying to pull info from this site which is a table with different "flags" where each one has info, such as date, number, defect, comment, workcenter and so on. I need to get the info from each flag.

So far I was able to isolate the HTML table block that has all the flags, and extract them. Now I need a way to get to the info itself in a logically ordered way, so that I can work with it further on, filtering out specific info and such.

Basically:

data = driver.find_element_by_xpath('//[@id="ctl00_Content_pnlGridView"]').find_elements_by_tag_name("tr")

returns a huge list with all the elemens, such as:

<selenium.webdriver.remote.webelement.WebElement (session="2a54d4afd1355859872becefd5f06f3a")

each of these contain the data I need (date, workcenter, comment, etc.).

How can I turn this list of elements into text? Adding .text() at the end doesn't work, since its not a single element.

Using soup.GetText() also doesn't work.

I've tried to iterate through the list with a for loop, and turn each element into text, but I don't know how to separate specific pieces of data from the output

CodePudding user response:

.text won't work, since that is for single web element.

You are using find_elements which will return a list in python.

So basically you've to iterate that list.

Something like this :

data = driver.find_element_by_xpath('//[@id="ctl00_Content_pnlGridView"]').find_elements_by_tag_name("tr")
for d in data:
  print(d.text)

CodePudding user response:

Well I've solved it. Might not be very elegant, but it works.

driver.get("https://comit.app.pmi/report/default.aspx")

flag_text = []
data = driver.find_element_by_xpath('//*[@id="ctl00_Content_pnlGridView"]')
flag_line = data.find_elements_by_tag_name("tr")
for flag in flag_line:
    flag_text.append(flag.text)

PA21_FLAGS = []
MA21_FLAGS = []
PA22_FLAGS = []
MA22_FLAGS = []

for flag in flag_text:
    if "PA21" in flag:
        PA21_FLAGS.append(flag)
    elif "MA21" in flag:
        MA21_FLAGS.append(flag)
    elif "PA22" in flag:
        PA22_FLAGS.append(flag)
    elif "MA22" in flag:
        MA22_FLAGS.append(flag)

today = datetime.now()
new_date = today.strftime("%d/%m/%Y")

for flag in PA21_FLAGS:
    if new_date in flag:
        print(flag)

for flag in MA21_FLAGS:
    if new_date in flag:
        print(flag)

for flag in PA22_FLAGS:
    if new_date in flag:
        print(flag)

for flag in MA22_FLAGS:
    if new_date in flag:
        print(flag)

Now I just need to figure out how to pull specific data from it

  • Related