I want to find the coordinates for a list of addresses.
I am using a data set that can be found here: "https://www.data.gv.at/katalog/dataset/kaufpreissammlung-liegenschaften-wien"
I've inputed this using the read_csv
function as "data". I'm using the tidyverse and jsonlite libraries. The only relevant columns are "Straße" which is the street name and "ON" which is the street number. The city for all of these is Vienna, Austria.
I'm using OpenStreetMap and have formatted my address data like the format requires:
data$formatted_address <- paste(ifelse(is.na(data$ON), "", data$ON), " ", tolower(data$Straße), ", vienna", sep = "")
This formats the adresses in this column as 1 milanweg, vienna
and 12 granergasse, vienna
. When I manually input this into the API format, it all works out and I get the coordinates: https://nominatim.openstreetmap.org/search?q=1 milanweg, vienna&format=json&polygon=1&addressdetails=1
Since I now want to do this for my entire row, I am using jsonlite to create requests in R.
data$coordinates <- data.frame(lat = NA, lon = NA)
for (i in 1:nrow(data)) {
result <- try(readLines(paste0("https://nominatim.openstreetmap.org/search?q=",
URLencode(data$formatted_address[i]), "&format=json&polygon=1&addressdetails=1")),
silent = TRUE)
if (!inherits(result, "try-error")) {
if (length(result) > 0) {
result <- fromJSON(result)
if (length(result) > 0 && is.list(result[[1]])) {
data$coordinates[i, ] <- c(result[[1]]$lat, result[[1]]$lon)
}
}
}
}
This should theoretically create the exact same API request, however, the lat and lon columns are always empty.
How can I fix this script to create a list of coordinates for each address in the data set?
CodePudding user response:
Data setup
library(tidyverse)
library(httr2)
df <- df %>%
mutate(
formatted_address = str_c(
if_else(is.na(on), "", on), " ", str_to_lower(strasse), " vienna"
) %>% str_remove_all(" ")
)
# A tibble: 57,912 × 7
kg_code katastralgemeinde ez plz strasse on formatted_address
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 1617 Strebersdorf 1417 1210 Mühlweg 13 13 mühlweg vienna
2 1607 Groß Jedlersdorf II 193 1210 Bahnsteggasse 4 4 bahnsteggasse vienna
3 1209 Ober St.Veit 3570 1130 Jennerplatz 34/20 34/20 jennerplatz vienna
4 1207 Lainz 405 1130 Sebastian-Brunner-Gasse 6 6 sebastian-brunner-gasse vienna
5 1101 Favoriten 3831 1100 Laxenburger Straße 2C -2 D 2C-2D laxenburgerstraße vienna
6 1101 Favoriten 3827 1100 Laxenburger Straße 2 C 2C laxenburgerstraße vienna
7 1101 Favoriten 3836 1100 hinter Laxenburger Straße 2 C 2C hinterlaxenburgerstraße vienna
8 1201 Auhof 932 1130 Keplingergasse 10 10 keplingergasse vienna
9 1213 Speising 135 1130 Speisinger Straße 29 29 speisingerstraße vienna
10 1107 Simmering 2357 1100 BATTIGGASSE 44 44 battiggasse vienna
# … with 57,902 more rows
# ℹ Use `print(n = ...)` to see more rows
API call and getting coordinates. I gathered the display name matched by the API, and the lat & lon data.
get_coords <- function(address) {
cat("Getting coordinates", address, "\n")
str_c(
"https://nominatim.openstreetmap.org/search?q=",
address,
"&format=json&polygon=1&addressdetails=1"
) %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
as_tibble() %>%
select(api_name = display_name,
lat, lon) %>%
slice(1)
}
df %>%
slice_sample(n = 10) %>%
mutate(coordinates = map(
formatted_address, possibly(get_coords, tibble(
api_name = NA_character_,
lat = NA_character_,
lon = NA_character_
))
)) %>%
unnest(coordinates)
# A tibble: 10 × 10
kg_code katastralgemeinde ez plz strasse on formatted_…¹ api_n…² lat lon
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1651 Aspern 3374 1220 ERLENWEG 8 8 erlenweg … 8, Erl… 48.2… 16.4…
2 1613 Leopoldau 6617 1210 Oswald-Redlich-Straße 31 31 oswald-r… 31, Os… 48.2… 16.4…
3 1006 Landstraße 2425 1030 HAGENMÜLLERGASSE 45018 45018 hagen… Hagenm… 48.1… 16.4…
4 1101 Favoriten 541 1100 HERNDLGASSE 7 7 herndlgas… 7, Her… 48.1… 16.3…
5 1607 Groß Jedlersdorf II 221 1210 Prager Straße 70 70 pragerst… Prager… 48.2… 16.3…
6 1006 Landstraße 1184 1030 PAULUSGASSE 2 2 paulusgas… 2, Pau… 48.1… 16.3…
7 1654 Eßling 2712 1220 KAUDERSSTRASSE 61 61 kauderss… 61, Ka… 48.2… 16.5…
8 1401 Dornbach 2476 1170 Alszeile NA alszeile v… Alszei… 48.2… 16.2…
9 1654 Eßling 745 1220 Kirschenallee 19 19 kirschen… 19, Ki… 48.2… 16.5…
10 1204 Hadersdorf 3139 1140 MITTLERE STRASSE NA mittlerest… Mittle… 48.2… 16.1…
# … with abbreviated variable names ¹formatted_address, ²api_name