I am learning how to webscrape with rvest and R and I want to extract the table embedded in the below website:
https://perfectunion.us/map-where-are-starbucks-workers-unionizing/
If you scroll midway through you will see an embedded table of starbucks stores and their unionize status.
When I use the CSS selector tool and highlight the table body, I get the code "td"
.
However when I use the below rvest code, I get:
{xml_nodeset (0)}
I have also used the inspection feature to see the table name (below) and I get the same error.
"table#wpgmza_table_1.responsive.wpgmza_table.dataTable.no-footer.dtr-inline.collapsed"
Can anyone help me extract that table into R? I am trying to do a science practice project
pacman::p_load(tidyverse,rvest)
url <- "https://perfectunion.us/map-where-are-starbucks-workers-unionizing/"
sb <- rvest::read_html(url)
#method1:
sb %>%
rvest::html_elements("td")
#method2
sb %>%
rvest::html_elements("table#wpgmza_table_1.responsive.wpgmza_table.dataTable.no-footer.dtr-inline.collapsed")
I appreciate any help to ultimately extract that table from the website and bring into R as a table.
CodePudding user response:
It looks like the table is stored as JSON file. If you use the Network tab from the browser developers tool one can retrieve the link.
url<-"https://perfectunion.us/wp-json/wpgmza/v1/datatables/base64eJy10zFrwzAQBeD-8mYV6rZJQFvo0CWBDIFC4lKu1sUWlRVzkkPA L9HcVLo1qVa795907sBXdO9OgoBGu bt-VuWZZrkm WlQ3R rosl ZEvmKzpS-HUAiRJEI-Kjj2dWygHwqFlrpPa5JSpEh1dH3rk7kfYCjSlPbUctpfBSapmonTUXpWOIph T24RaAHnMj19zvhms-QB3KBx1H92EVG ymj-ZzRfslozzLa84z24v-tj-vZ1PRb66euG5tGoFDhGvkbUjhYF1nSw21IqE2vM4zjBWiiMh0"
jsonlite::fromJSON(url)
I'm not sure how stable this link is, it may change on a regular basis.