I want to get some information from tweets posted on the platform StockTwits. Here you can see an example tweet:
I have got this far so far:
library(rvest)
read_html("https://stocktwits.com/SunAndStorm/message/499613811") |>
html_nodes()
The final result should be a dataframe, which should look like this:
# A tibble: 1 × 5
Reply Reshare Like Share Search
<lgl> <lgl> <lgl> <lgl> <lgl>
5 0 1 0 0
CodePudding user response:
I do not use the html nodes, but find the element with the xpath. Folowing code gives you the information you need
url <- "https://stocktwits.com/SunAndStorm/message/499613811"
# Set up driver
driver <- rsDriver(browser = "firefox", chromever = NULL)
remDr <- driver[["client"]]
# Go to site
remDr$navigate(url)
# Extract information using xpath
info <- remDr$findElement(using = "xpath", "/html/body/div[2]/div/div[2]/div[2]/div[2]/div/div/div/div[1]/div[1]/div/div[2]/article/div/div[5]")
Then you can use getelementtext to find the information
> info$getElementText()
[[1]]
[1] "4Comments\n0Reshares\n7Likes"
If you need help converting this string to a dataframe let me know and I can help you out, but I assume this is not the main problem.
Kind regerads
CodePudding user response:
Look into the network section in the developer tools and you'd find their API. Call on it with a tweet ID of interest.
I composed a start for you here. I couldn't find reshares and search. but I am sure it is there somewhere. Since you have thousand of tweets to gather info on, this method is more efficient.
library(tidyverse)
library(httr2)
get_stockwits <- function(id) {
data <-
str_c("https://api.stocktwits.com/api/2/messages/", id, "/conversation.json?limit=21") %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE)
tibble(
tweet = data %>%
getElement("message") %>%
getElement("body"),
reply = data %>%
getElement("message") %>%
getElement("conversation") %>%
getElement("replies"),
likes = data %>%
getElement("message") %>%
getElement("likes") %>%
getElement("total"),
comments = data %>%
getElement("children") %>%
getElement("messages") %>%
getElement("body")
) %>%
nest(comments = comments)
}
get_stockwits(469518468)
# A tibble: 1 x 4
tweet reply likes comments
<chr> <int> <int> <list>
1 $GME going back in all this month 5 1 <tibble [2 x 1]>
Unnest comments
to see the comments
get_stockwits(469518468) %>%
unnest(comments)
# A tibble: 2 x 4
tweet reply likes comments
<chr> <int> <int> <chr>
1 $GME going back in all this month 5 1 @okkenny yeah with options
2 $GME going back in all this month 5 1 @okkenny playing monthly only