I literally don't know how to describe my problem other than the fact that jsoup actively skips over the one value I need. I'm attempting to grab the value of average engagement/likes/comments on Instagram posts from a selected user; but let's just stick with engagement.
So far in my testing, I've seen it skip both values in <span id=... and also <span class=...
I have two versions of my code, neither of which provide any sort of helpful result.
*Just as reference, this is what I can see when I inspect element the page: <span >4,300</span> == $0
(https://analisa.io/profile/officialrickastley)
General:
import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Code Ver 1.
String accountUsername = "officialrickastley";
String url = "https://analisa.io/profile/" accountUsername;
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36").get();
Elements engagement = doc.getElementsByClass("js-summary-whole-engagement");
System.out.println(engagement);
The above outputs: <span ><i ></i></span>
The latter half I believe to be irrelevant and I think appears later on down the page. But after the first half where I would expect the numbered value, it just doesn't have anything?
Code Ver 2.
String accountUsername = "officialrickastley";
String url = "https://analisa.io/profile/" accountUsername;
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36").get();
Elements engagement = doc.getElementsByClass("js-summary-whole-engagement");
System.out.println(engagement.text());
The above outputs nothing, not even a space or anything.
I've also tried something called doc.select
and quite a few other things like .value, but nothing actually addresses the issue I'm having. I have also seen people parse the html directly from within the class, but if that is a possible solution, I'm unsure how to make the connection to the website and then store it to be parsed, since I want the data to update everyday.
Any help or suggestions would be greatly appreciated, thanks!
CodePudding user response:
getElementsByClass returns an array of elements. Select the first one and print its text:
System.out.println(engagement[0].text());
Also, it's good practice to name lists or arrays in plural: Elements engagement -> Elements engagements
CodePudding user response:
You could try this (read comments):
try {
String accountUsername = "officialrickastley";
String url = "https://analisa.io/profile/" accountUsername;
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36").get();
// Get Name and Analisa Handle
String keyWords = doc.select("meta[name=\"keywords\"]").first().attr("content");
String[] contParts = keyWords.split(",\\s");
String name = contParts[0];
String handle = contParts[1];
// Get desired stats:
keyWords = doc.select("meta[property=\"og:description\"]").first().attr("content");
contParts = keyWords.split(",\\s");
String engagmentRate = contParts[0].split("\\s ")[0];
String avgLikes = contParts[1].split("\\s ")[0];
String avgComments = contParts[2].split("\\s ")[0];
String followers = contParts[3].split("\\s ")[0];
System.out.println("Name: " name " (" handle ")");
System.out.println("Engagment Rate: " engagmentRate);
System.out.println("Avg Likes: " avgLikes);
System.out.println("Avg Comments: " avgComments);
System.out.println("Followers: " followers);
} catch (IOException ex) {
// Handle exception whichever way you want, just don't leave it blank:
System.err.println(ex);
}
The code above should output the following into the Console Window:
Name: Rick Astley (@officialrickastley Analisa)
Engagment Rate: 2.44%
Avg Likes: 2.37
Avg Comments: 0.07
Followers: 176,125