Home > Software engineering >  web scrape a progress bar in R
web scrape a progress bar in R

Time:10-16

I am scraping different projects from the following website https://indiainvestmentgrid.gov.in/opportunities/nip-project/606803. There is a progress bar on this webpage that shows a project stage (under conceptualisation - completed). Do you have any suggestions how can I scrape this?

I am using RSelenium, extracting the page source and looking through it in the following way:

remDr$navigate('https://indiainvestmentgrid.gov.in/opportunities/nip-project/606803')
url <- read_html(remDr$getPageSource()[[1]])

project_title <- url %>% 
    html_nodes(".prj-name") %>%
    html_text()

However, I am not sure how to scrape this progress bar. Selector Gadget shows that the completed circles/bars are signed as ".active-stage", but I cannot find it in my HTML code. In the case of this project, it should be scraped as "Under Implementation".

CodePudding user response:

It seems like you are using both RSelenium and rvest. Also, mind that html_nodes is deprecated. The coloring is of the bars is (I think) defined by the projectStageID. The following should work for most of those pages.

library(rvest)
library(magrittr

url <- "https://indiainvestmentgrid.gov.in/opportunities/nip-project/606801"

out <- read_html(url)

out %>%
  html_elements(css = "#projectStageId") %>%
  as.character  %>%
  substr(start = 49, stop = nchar(.)-2) %>%
  switch(
    "500020" = "Under Conceptualization",
    "600037" = "Under Development",
    "500021" = "Under Implementation",
    "500023" = "Completed",
    NA
  )
  • Related