I am scraping different projects from the following website https://indiainvestmentgrid.gov.in/opportunities/nip-project/606803. There is a progress bar on this webpage that shows a project stage (under conceptualisation - completed). Do you have any suggestions how can I scrape this?
I am using RSelenium, extracting the page source and looking through it in the following way:
remDr$navigate('https://indiainvestmentgrid.gov.in/opportunities/nip-project/606803')
url <- read_html(remDr$getPageSource()[[1]])
project_title <- url %>%
html_nodes(".prj-name") %>%
html_text()
However, I am not sure how to scrape this progress bar. Selector Gadget shows that the completed circles/bars are signed as ".active-stage", but I cannot find it in my HTML code. In the case of this project, it should be scraped as "Under Implementation".
CodePudding user response:
It seems like you are using both RSelenium
and rvest
. Also, mind that html_nodes
is deprecated.
The coloring is of the bars is (I think) defined by the projectStageID. The following should work for most of those pages.
library(rvest)
library(magrittr
url <- "https://indiainvestmentgrid.gov.in/opportunities/nip-project/606801"
out <- read_html(url)
out %>%
html_elements(css = "#projectStageId") %>%
as.character %>%
substr(start = 49, stop = nchar(.)-2) %>%
switch(
"500020" = "Under Conceptualization",
"600037" = "Under Development",
"500021" = "Under Implementation",
"500023" = "Completed",
NA
)