I have a dataset of 464 Toronto addresses. The addresses look like this:
raw_data = as.data.frame(c("570 BLOOR ST W TORONTO ON M6G1K1", "10 STAYNER AVE NORTH YORK ON M6B1N4", "1200 WOODBINE AVE EAST YORK ON M4C4E3", "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3")) %>% setNames("address")
address
1 570 BLOOR ST W TORONTO ON M6G1K1
2 10 STAYNER AVE NORTH YORK ON M6B1N4
3 1200 WOODBINE AVE EAST YORK ON M4C4E3
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3
I want to add a variable that says the ward of the city that each address is a part of. The city website has an application that allows you to check what ward each address is in. Thus, I could enter each of the 464 addresses manually and record the ward. However, I'm wondering if there's a way to automate this task in R. I'd really appreciate any input!
For reference, the desired output for the addresses I listed would be:
cleaned_data = as.data.frame(
cbind(c("570 BLOOR ST W TORONTO ON M6G1K1", "10 STAYNER AVE NORTH YORK ON M6B1N4", "1200 WOODBINE AVE EAST YORK ON M4C4E3", "2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"),
c("University-Rosedale", "Eglinton-Lawrence", "Beaches-East York", "Scarborough"))
) %>% setNames(c("address", "ward"))
address ward
1 570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale
2 10 STAYNER AVE NORTH YORK ON M6B1N4 Eglinton-Lawrence
3 1200 WOODBINE AVE EAST YORK ON M4C4E3 Beaches-East York
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 Scarborough
One extra challenge here is that some of the addresses in my dataset don't correspond to a unique address on the city website (e.g. row 4 of my example data). Having an automated solution to this would be great, but if it's too challenging, I should be able to do the few that are like this manually in a reasonable amount of time.
CodePudding user response:
Yes of course there is a way to do that using RSelenium.
It should look like this.
library(RSelenium)
library(tidyverse)
# Création du Driver
remDr0 <- rsDriver(browser = "firefox", port=4089L)
remDr <- remDr0$client
# Ouvrir ou fermer le navigateur
remDr$open()
remDr$close()
remDr$open()
# Ouvrir une page web
url <- "https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/ward-profiles/"
remDr$navigate(url)
wardlooker <- function(adresse){
Recherche <- remDr$findElement('css selector', '#js_input__address')
Recherche$sendKeysToElement(list(adresse))
frames <- remDr$findElements("css selector", '.btn-lg')
frames[[1]]$clickElement()
art <- remDr$findElements('css selector', 'here the css of where the result should pop up that I could not find')
ward <- unlist(lapply(art, function(x){x$getElementText()}))
}
And then you can apply this function to all your adresses thanks to map.
Another way to do it, would be using QGIS and maps of the yard.
CodePudding user response:
A solution without RSelenium
. By the way, the last address that you provided does not exist according to the website.
require(tidyverse)
require(httr2)
df <- tibble(
address = c(
"570 BLOOR ST W TORONTO ON M6G1K1",
"10 STAYNER AVE NORTH YORK ON M6B1N4",
"1200 WOODBINE AVE EAST YORK ON M4C4E3",
"2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"
)
)
get_ward <- function(query) {
response <- paste0("https://map.toronto.ca/geoservices/rest/search/rankedsearch?searchArea=1&matchType=1&projectionType=1&retRowLimit=10&areaTypeCode1=CITW&areaTypeCode2=WD03&searchString=",
query) %>%
str_replace_all(" ", " ") %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = T) %>%
.$result %>%
.$bestResult %>%
.$detail %>%
str_extract("(?<=[:]).*") %>%
str_squish()
ifelse(length(response) == 0,
return(NULL),
return(response))
}
df %>%
mutate(ward = map(address, get_ward) %>%
as.character())
# A tibble: 4 x 2
address ward
<chr> <chr>
1 570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale (11)
2 10 STAYNER AVE NORTH YORK ON M6B1N4 Eglinton-Lawrence (8)
3 1200 WOODBINE AVE EAST YORK ON M4C4E3 Beaches-East York (19)
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 NULL