Objective
Scrape a vector of file paths to retail store locations, while ignoring the hyperlinked telephone number. I am new to working with html elements.
What I have tried
library(rvest)
library(tidyverse)
library(xml2)
store.paths <- read_html("https://www.walmart.com/store/directory/al/alabaster") %>%
html_nodes(xpath = '//*[@class="store-directory-container"]') %>%
html_nodes("a") %>%
html_attr('href')
which yields
[1] "/store/4756" "tel:205-624-6229" "/store/423" "tel:205-620-0360"
while my desire output is
[1] "/store/4756" "/store/423"
I have tried replacing store-directory-container
with storeBanner
and the result is empty.
Thanks!
CodePudding user response:
It looks like the a
tags you want also have the class storeBanner
while the telephone links do not. It would be easy to grab them with
store.paths <- read_html("https://www.walmart.com/store/directory/al/alabaster") %>%
html_elements("a.storeBanner") %>%
html_attr('href')
I also used the CSS selector syntax in this case because it's easier and use the recommend html_elements
function because html_nodes
is soft-deprecated. You can't just replace "store-directory-container" with "storeBanner" because the the "a" tag is below the "store-directory-container" but in the case of "storeBanner" it is that element, not a child of that element.
CodePudding user response:
You can add one more xpath with storeBanner after tag
store.paths <- read_html("https://www.walmart.com/store/directory/al/alabaster") %>%
html_nodes(xpath = '//*[@]') %>%
html_nodes("a") %>%
html_nodes(xpath = '//*[@]') %>%
html_attr('href')
store.paths
[1] "/store/4756" "/store/423"