Im working on program which gets all the links from the webiste and searches for input word. Then enters each of this links and search again and etc. Program does this 3 times (thats why n is 3). Code below does it with recursion method and seems to be working just fine.
However i would like to speed up this process by using threads. How can i implement this? From what i heard I can propably use fork/join for that.
public static void getLinks(String url, Set<String> urls, String word, int n) {
if(url.contains(word)) {
System.out.println("Found: " url);
}
if (urls.contains(url)) {
return;
}
urls.add(url);
if(n<3) {
try {
Document doc = Jsoup.connect(url).get();
Elements elements = doc.select("a[href]");
for (Element element : elements) {
System.out.println(element.absUrl("href"));
getLinks(element.absUrl("href"), urls, word, n 1);
}
} catch (IOException e) {
e.printStackTrace();
}
} else return;
}
public static void main(String[] args) {
Set<String> links = new HashSet<>();
String word = "root";
getLinks("https://example.com", links, word, 0);
}
PS in the final version of the program links thats matching with input word will be printed in GUI.
CodePudding user response:
You can use a worker queue in which you submit runnables to be executed. As you discover links, you submit tasks for underlying pages to crawl.
Basically have a producer of work and consumer of work.
https://www.baeldung.com/java-blocking-queue
CodePudding user response:
The simple way is to submit getLinks
to a thread pool
while iterating through Elements
:
static ExecutorService executorService = Executors.newCachedThreadPool();
public static void getLinks(String url, Set<String> urls, String word, int n) {
if(n<3) {
try {
for (Element element : new ArrayList<Element>()) {
executorService.submit(() -> element.absUrl("href"), urls, word, n 1);
}
} catch (Exception e) {
e.printStackTrace();
}
} else return;
}