Home > database >  Keep only an element of a webpage while web-scraping
Keep only an element of a webpage while web-scraping

Time:02-03

I am trying to extract a table from a webpage with python. I managed to get all the contents inside of that table, but since I am very new to webscrapping I don't know how to keep only the elements that I am looking for.

I know that I should look for this class in the code: <a , which specify the items in the table.

So how can I keep only those classes to then extract the title of them?

<a  title="r/Python" href="/r/Python/">r/Python</a>
<a  title="r/Java" href="/r/Java/">r/Java</a>

I miserably failed in writing a code for that. I don't know how I could extract only these classes, so any inputs will be highly appreciated.

CodePudding user response:

To extract the value of title attributes you can use list comprehension and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a._3BFvyrImF3et_ZF21Xd8SC[title]")])
    
  • Using XPATH:

    print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.XPATH, "//a[@class='_3BFvyrImF3et_ZF21Xd8SC' and @title]")])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

Okay, I have made a very simple thing that worked.

Basically I pasted the code on VSCODE and the selected all the occurrences of that class. Then I just had to copy and paste in another file. Not sure why the shortcut CTRL Shift L did not work, but I have managed to get what I needed.

Select all occurrences of selected word in VSCode

  • Related