Home > OS >  How to extract all text's link and other properties form html?
How to extract all text's link and other properties form html?

Time:12-04

Note, if it is single element I can extract but I need to extract all of them together.

Hi I am trying to extract the text and link from a list of items from a page using Selenium and Java. I am able to extract all link text but facing issue to figure out the link text. The html code looks like below:

<div >
        <a href="/category/agricultural-products-service">
                <img src="/assets/images/icon/1.jpg" alt="icon" >
                    <h5 >Agricultural </h5>
        </a>
 </div>
<div >
        <a href="/category/products-service">
                <img src="/assets/images/icon/7.jpg" alt="icon" >
                    <h5 >Products</h5>
        </a>
 </div>

Using h5 I can extract all the elements but I need to extract all href of those elements

CodePudding user response:

To extract text or link or any other attribute value from several web elements you need to collect all these elements in a list and then to iterate over the list extracting the desired value from each web element object.
As following:

List<WebElement> elements = driver.findElements(By.tagName("h5"));
for(WebElement element : elements){
    String value = element.getText();
    System.out.println(value);
}

This will give you all the links there

List<WebElement> links = driver.findElements(By.cssSelector(".top_cat a"));
for(WebElement link : links){
    String value = link.getAttribute("href");
    System.out.println(value);
}

On this specific page the structure is:
There are several blocks defined by elements. Inside each such block several links and titles. Each a is below the element and the title is below it a element. So, extracting the links and titles here can be done as following:

List<WebElement> blocks = driver.findElements(By.cssSelector(".all_cat"));
for(WebElement block : blocks){
    List<WebElement> links = block.findElements(By.xpath(".//a"));
    for(WebElement link : links){
        String linkValue = link.getAttribute("href");
        System.out.println("The link is "   linkValue);
        WebElement title = block.findElements(By.xpath(".//h5"));
        String titleValue = title.getText();
        System.out.println("The title is "   titleValue);
    }
}
  • Related