How to Scrape data from chart with R-CodePudding

I would like to scrape data from the tweet's volume chart on https://bitinfocharts.com into a data file using R. I'm a newbie in this world and after seeking a lot on the web I have no other choice than to ask for your help. I found the same question in the forum but it's for python (How to Scrape data from chart on https://bitinfocharts.com)

The chart in question is the following: https://bitinfocharts.com/comparison/decred-tweets.html#alltime

I'm looking for a data table with each date and the respective number of tweets for that day as the columns.

I hope your experience will help me

CodePudding user response：

This piece of code should help extract the data that you need:

library('rvest')
library('stringr')

url <- 'https://bitinfocharts.com/comparison/decred-tweets.html#alltime'
webpage <- read_html(url)
res <- str_match(webpage, 'new Dygraph\\(document.getElementById\\(\"container\\"\\),\\s*(.*?)\\s*, \\{labels')
res[,2]

After you do that you should parse the res[,2] and transform it according to your needs.

CodePudding user response：

The new Dyagraph part comes from the page source. If you search for it in the page source (view-source:https://bitinfocharts.com/comparison/decred-tweets.html in your browser) you will notice it. Basically the website creates the graph based on this data. To parse the matrix you need to first remove the "new Date("") parts of the string and then parse the full string using a json library.

Here is the complete code that should help you with that:

library('rvest')
library('stringr')
library('jsonlite')

url <- 'https://bitinfocharts.com/comparison/decred-tweets.html#alltime'
webpage <- read_html(url)
res <- str_match(webpage, 'new Dygraph\\(document.getElementById\\(\"container\\"\\),\\s*(.*?)\\s*, \\{labels')
res[,2] <- gsub("new Date\\(", "", res[,2])
res[,2] <- gsub("\\)", "", res[,2])
document <- fromJSON(txt=res[,2])
document
print(document[1, 1])
print(document[1, 2])