Home > Blockchain >  Unable to access text from a class using selenium on python
Unable to access text from a class using selenium on python

Time:02-25

I am willing to parse https://2gis.kz , and I encountered the problem that I am getting error while using .text or any methods used to extract text from a class

I am typing the search query such as "fitness"

My window variable is

all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
    card_.click()
    window = driver.find_element(By.CLASS_NAME, "_18lzknl")

This is a quite simplified version of how I open a mini-window with all of the essential information inside it. Below I am attaching the piece of code where I am trying to extract text from a phone number holder.

    texts = window.find_elements(By.CLASS_NAME,'_b0ke8')

    print(texts) # this prints out something from where I am concluding that this thing is accessible
    try:
        print(texts.text)
    except:
        print(".text")
    try:
        print(texts.text())
    except:
        print(".text()")
    try:
        print(texts.get_attribute("innerHTML"))
    except:
       print('getAttribute("innerHTML")')
    try:
        print(texts.get_attribute("textContent"))
    except:
        print('getAttribute("textContent")')
    try:
        print(texts.get_attribute("outerHTML"))
    except:
        print('getAttribute("outerHTML")')

Hi, guys, I solved an issue. The .text was not working for some reason. I guess developers somehow managed to protect information from using this method. I used a

get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class

and now it works like a charm.

                texts = window.find_elements(By.TAG_NAME, "bdo")

                with io.open("t.txt", "a", encoding="utf-8") as f:
                    for text in texts:
                        nums = re.sub("[^0-9]", "", 
                        text.get_attribute("innerHTML"))
                        f.write(nums '\n')
                    f.close()

So the problem was that:

  1. I was trying to print a list of items just by using print(texts)
  2. Even when I tried to print each element of texts variable in a for loop, I was getting an error due to the fact that it was decoded in utf-8.

I hope someone will find it useful and will not spend a plethora of time trying to fix such a simple bug.

CodePudding user response:

find_elements method returns a list of web elements. So this

texts = window.find_elements(By.CLASS_NAME,'_b0ke8')

gives you texts a list of web elements.
You can not apply .text method directly on list.
In order to get each element text you will have to iterate over elements in the list and extract that element text, like this:

text_elements = window.find_elements(By.CLASS_NAME,'_b0ke8')
for element in text_elements:
    print(element.text)

Also, I'm not sure about locators you are using.
_1hf7139, _18lzknl and _b0ke8 class names are seem to be dynamic class names i.e they may change each browsing session.

  • Related