I have a list of keywords (['a', 'b', 'c']
) and I'd like to check which appear on a given page, using Selenium (ideally each with the number of occurrences).
The naive way would be to look for each separately using XPATH (//*[contains(text(),'a')]
) (or body text, page source etc.) but it seems to be an overkill to go over the entire page again and again for each of the strings.
I have quite a few sites to go over so I'd like to do it efficiently. Do I just get all text from the entire <html>
(so including the title and the description on top of the <body>
) and then perform all the searching on my own outside of the scope of Selenium (e.g. Rabin-Karp etc.) or is there a reasonable out of the box solution?
CodePudding user response:
You can search for elements containing any of the given strings like
//*[contains(text(),'a') or contains(text(),'b') or contains(text(),'c')]
and after that to check what specific keyword is presented there and update the counters etc.
CodePudding user response:
If you do not need to somehow account the structure of the content it will be completely fine to take the entire text of the page and count the keyword occurrences.
Here is short demo:
public static void main(String[] args) throws IOException {
WebDriver driver = null;
List<String> keyWords = Arrays.asList(new String[]{"selenium", "http", "something"});
try{
driver = new RemoteWebDriver(
new URL("http://selenium-hub:4444"),
new ChromeOptions()
);
driver.get("https://www.webelement.click/en/welcome");
String total = driver.findElement(By.tagName("body")).getText();
for(String keyWord: keyWords){
Pattern p = Pattern.compile(keyWord, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(total);
int i = 0;
while (m.find())
i ;
System.out.println("Keyword [" keyWord "] has [" i "] occurrences");
}
}finally {
if (driver != null){
driver.quit();
}
}
}