I am trying to extract a table from a webpage with python. I managed to get all the contents inside of that table, but since I am very new to webscrapping I don't know how to keep only the elements that I am looking for.
I know that I should look for this class in the code: <a
, which specify the items in the table.
So how can I keep only those classes to then extract the title of them?
<a title="r/Python" href="/r/Python/">r/Python</a>
<a title="r/Java" href="/r/Java/">r/Java</a>
I miserably failed in writing a code for that. I don't know how I could extract only these classes, so any inputs will be highly appreciated.
CodePudding user response:
To extract the value of title
attributes you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a._3BFvyrImF3et_ZF21Xd8SC[title]")])
Using XPATH:
print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.XPATH, "//a[@class='_3BFvyrImF3et_ZF21Xd8SC' and @title]")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
Okay, I have made a very simple thing that worked.
Basically I pasted the code on VSCODE and the selected all the occurrences of that class. Then I just had to copy and paste in another file. Not sure why the shortcut CTRL Shift L did not work, but I have managed to get what I needed.