i am downloading some HTML element into R
library(RCurl)
curl = getCurlHandle()
curlSetOpt(cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer = TRUE, curl = curl)
html4 <- getURL('http://website/Busqueda_persona.aspx', curl = curl)
viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z /=]*).*', '\\1', html4))
viewstategenerator <- as.character(sub('.*id="__VIEWSTATEGENERATOR" value="([0-9a-zA-Z /=]*).*', '\\1', html4))
eventvalidation <- as.character(sub('.*id="__EVENTVALIDATION" value="([0-9a-zA-Z /=]*).*', '\\1', html4))
params <- list(
'__VIEWSTATE' = viewstate,
'__VIEWSTATEGENERATOR' = viewstategenerator,
'__EVENTVALIDATION' = eventvalidation,
'ctl00$cphMain$ddlTipoIdentificacion' = "296" ,
'ctl00$cphMain$txtNumeroIdentificacion' = "1109927000",
'ctl00$cphMain$ddlTipoIdentificacionPersonaACargo' = "0",
'ctl00$cphMain$btnBuscar' = "Buscar"
)
html5 = postForm('http://website/Busqueda_persona.aspx', .params = params, curl = curl)
part of the resulting html5
includes this
onclick='javascript:Direccionar(1682000,296,"1109927000",1);'
I require to extract the 1682000
and store it into a separate element
EDIT1: after trying @akrun advise, i get this
sub("\\D (\\d ).*", "\\1", html5)
[1] "3"
attr(,"Content-Type")
charset
"text/html" "utf-8"
i have uploaded the entire html5
R element here https://controlc.com/b9d622ff
CodePudding user response:
We may use str_extract
from stringr
library(stringr)
str_extract_all(str2, "(?<=onclick\\='javascript\\:Direccionar\\()\\d ")[[1]]
[1] "1682000" "1682000"
Or use in combination with parse_number
readr::parse_number(str_extract_all(str1, "onclick='javascript\\:Direccionar\\([0-9] ")[[1]])
[1] 1682000
There are two instance of the substring
> substr(str2, 51380, 51418)
[1] "onclick='javascript:Direccionar(1682000"
> substr(str2, 51536, 51574)
[1] "onclick='javascript:Direccionar(1682000"
It was found by str_locate_all
> str_locate_all(str2, "(?<=onclick='javascript:Direccionar\\()[0-9] ")
[[1]]
start end
[1,] 51412 51418
[2,] 51568 51574