web scrape a progress bar in R-CodePudding

I am scraping different projects from the following website https://indiainvestmentgrid.gov.in/opportunities/nip-project/606803. There is a progress bar on this webpage that shows a project stage (under conceptualisation - completed). Do you have any suggestions how can I scrape this?

I am using RSelenium, extracting the page source and looking through it in the following way:

remDr$navigate('https://indiainvestmentgrid.gov.in/opportunities/nip-project/606803')
url <- read_html(remDr$getPageSource()[[1]])

project_title <- url %>% 
    html_nodes(".prj-name") %>%
    html_text()

However, I am not sure how to scrape this progress bar. Selector Gadget shows that the completed circles/bars are signed as ".active-stage", but I cannot find it in my HTML code. In the case of this project, it should be scraped as "Under Implementation".

CodePudding user response：

It seems like you are using both RSelenium and rvest. Also, mind that html_nodes is deprecated. The coloring is of the bars is (I think) defined by the projectStageID. The following should work for most of those pages.

library(rvest)
library(magrittr

url <- "https://indiainvestmentgrid.gov.in/opportunities/nip-project/606801"

out <- read_html(url)

out %>%
  html_elements(css = "#projectStageId") %>%
  as.character  %>%
  substr(start = 49, stop = nchar(.)-2) %>%
  switch(
    "500020" = "Under Conceptualization",
    "600037" = "Under Development",
    "500021" = "Under Implementation",
    "500023" = "Completed",
    NA
  )