Home > other >  R Web scraping data from StockTwits website
R Web scraping data from StockTwits website

Time:12-05

I want to get some information from tweets posted on the platform StockTwits. Here you can see an example tweet: enter image description here

I have got this far so far:

library(rvest)

read_html("https://stocktwits.com/SunAndStorm/message/499613811") |> 
  html_nodes()

The final result should be a dataframe, which should look like this:

# A tibble: 1 × 5
  Reply Reshare Like  Share Search
  <lgl> <lgl>   <lgl> <lgl> <lgl> 
  5     0       1     0     0  

CodePudding user response:

I do not use the html nodes, but find the element with the xpath. Folowing code gives you the information you need

url <- "https://stocktwits.com/SunAndStorm/message/499613811"

# Set up driver
driver <- rsDriver(browser = "firefox", chromever = NULL)
remDr <- driver[["client"]]

# Go to site
remDr$navigate(url)

# Extract information using xpath
info <- remDr$findElement(using = "xpath", "/html/body/div[2]/div/div[2]/div[2]/div[2]/div/div/div/div[1]/div[1]/div/div[2]/article/div/div[5]")

Then you can use getelementtext to find the information

> info$getElementText()
[[1]]
[1] "4Comments\n0Reshares\n7Likes"

If you need help converting this string to a dataframe let me know and I can help you out, but I assume this is not the main problem.

Kind regerads

CodePudding user response:

Look into the network section in the developer tools and you'd find their API. Call on it with a tweet ID of interest.

I composed a start for you here. I couldn't find reshares and search. but I am sure it is there somewhere. Since you have thousand of tweets to gather info on, this method is more efficient.

library(tidyverse)
library(httr2)

get_stockwits <- function(id) {
  data <-
    str_c("https://api.stocktwits.com/api/2/messages/", id, "/conversation.json?limit=21") %>%
    request() %>%
    req_perform() %>%
    resp_body_json(simplifyVector = TRUE)

  tibble(
    tweet = data %>%
      getElement("message") %>%
      getElement("body"),
    reply = data %>%
      getElement("message") %>%
      getElement("conversation") %>%
      getElement("replies"),
    likes = data %>%
      getElement("message") %>%
      getElement("likes") %>%
      getElement("total"),
    comments = data %>%
      getElement("children") %>%
      getElement("messages") %>% 
      getElement("body")
  ) %>%
    nest(comments = comments)
}

get_stockwits(469518468)

# A tibble: 1 x 4
  tweet                             reply likes comments        
  <chr>                             <int> <int> <list>          
1 $GME going back in all this month     5     1 <tibble [2 x 1]>

Unnest comments to see the comments

get_stockwits(469518468) %>% 
  unnest(comments)

# A tibble: 2 x 4
  tweet                             reply likes comments                     
  <chr>                             <int> <int> <chr>                        
1 $GME going back in all this month     5     1 @okkenny yeah with options   
2 $GME going back in all this month     5     1 @okkenny playing monthly only
  • Related