Home > Software design >  How to extract some specific values from HTML?
How to extract some specific values from HTML?

Time:10-28

Im trying to use rvest in order to extract some values from a ASP.NET HTML element i have in R.

With selectorgadget im trying to get identify the elements i need but im unsure whether im doing it correctly:

main =read_html(html_detallepersona)

Name = a %>% html_elements("fieldset  fieldset > ul~ ul  ul li:nth-child(1)")  

paste(Name)
[1] "<li>\r\n                <span id=\"ctl00_cphMain_lblPrimerNombre\" class=\"label\">Primer Nombre(*)</span>\r\n                <input name=\"ctl00$cphMain$txtPrimerNombre\" type=\"text\" value=\"Veronica\" maxlength=\"30\" id=\"ctl00_cphMain_txtPrimerNombre\" disabled class=\"aspNetDisabled comboBox\" style=\"text-transform: capitalize;\">\n</li>"

Name %>%  html_attr("value")
[1] NA

I need the value itself (Veronica)

The source HTML looks like this so i dont know if the rvest approach is the right one for my case. I rely in the input name in order to get the desired value which is next to it. enter image description here

enter image description here

EDIT 1: What about a dropdown menu? Im looking for the "Contributivo" value

enter image description here

 Regimen = detallepersona %>% html_elements("#ctl00_cphMain_upAseguradora ul:nth-child(1) li:nth-child(1) option") #%>%  html_text()

paste(Regimen)
[1] "<option value=\"0\">-Seleccione-</option>\n"           "<option selected value=\"58\">Contributivo</option>\n"
[3] "<option value=\"61\">Especial</option>\n"              "<option value=\"60\">Pobre no afiliado</option>\n"    
[5] "<option value=\"59\">Subsidiado</option>"             

Regimen %>% html_text()
[1] "-Seleccione-"      "Contributivo"      "Especial"          "Pobre no afiliado" "Subsidiado"       

CodePudding user response:

Right now you are selecting the <li> element, if you want the value= attribute then you can select it with the html_attr() function.

Name %>% html_attr("value")
  • Related