I am trying to scrape the name of city and address of all Apple stores in the UK using rvest
library(rvest)
library(xml2)
library(tidyverse)
my_url <- read_html("https://www.apple.com/uk/retail/storelist/")
# extract city name
city_name <- my_url %>% html_elements("h2") %>% html_text2()
length(city_name)
# 27 cities
address <- my_url %>% html_elements("address") %>% html_text2()
length(address)
# 38 addresses
I am getting more addresses than city names. This is because some cities have multiple stores. How do I get same number city name and address so that I can put them in the dataframe?
CodePudding user response:
You can do
library(rvest)
library(xml2)
library(tidyverse)
read_html("https://www.apple.com/uk/retail/storelist/") %>%
html_elements(xpath = "//div[@class='state']") %>%
lapply(function(x) {
data.frame(city = html_element(x, "h2") %>% html_text(),
address = html_elements(x, "address") %>% html_text2())}) %>%
do.call(rbind, .) %>%
as_tibble()
#> # A tibble: 38 x 2
#> city address
#> <chr> <chr>
#> 1 Aberdeen "27/28 Ground Level Mall\nUnion Square\nAberdeen , AB11 ~
#> 2 Antrim "Upper Ground Floor\n1 Victoria Square\nBelfast , BT1 4Q~
#> 3 Berkshire "The Oracle Shopping Centre\nUpper Level\nReading , RG1 ~
#> 4 Bristol "11 Philadelphia Street\nQuakers Friars\nBristol , BS1 3~
#> 5 Bristol "Upper Mall\nThe Mall at Cribbs Causeway\nBristol , BS34~
#> 6 Buckinghamshire "26 Midsummer Place\nMidsummer Boulevard\nMilton Keynes ~
#> 7 Cambridgeshire "Grand Arcade Shopping Centre\nCambridge , CB2 3AX\n0122~
#> 8 Cardiff "63-66 Grand Arcade\nSt David’s Dewi Sant\nCardiff , CF1~
#> 9 Central London "No. 1-7 The Piazza\nLondon , WC2E 8HB\n020 7447 1400"
#> 10 Central London "235 Regent Street\nLondon , W1B 2EL\n020 7153 9000"
#> # ... with 28 more rows
Created on 2022-04-12 by the reprex package (v2.0.1)