In R/rvest, as below code , I can run the html_text()
, but when i run want to get the linkage following every text web %>% html_node("div.p13n-desktop-grid") %>% html_attr(name='href')
failed .Anyone can help? Thanks!
library(rvest)
url <- "https://www.amazon.com/Best-Sellers-Industrial-Scientific-3D-Printers/zgbs/industrial/6066127011/ref=zg_bs_pg_1?_encoding=UTF8&pg=1"
web <- rvest::read_html(url)
web %>% html_node("div.p13n-desktop-grid") %>% html_text() %>% strsplit("#") # ok
web %>% html_node("div.p13n-desktop-grid") %>% html_attr(name='href') # want to get the linkage following the click text, but failed
CodePudding user response:
The href
attribute is an attribute of the a
tags. Not clear which one you want, there are 119 href
found:
web %>%
html_node("div.p13n-desktop-grid") %>%
html_elements("a") %>%
html_attr(name = 'href')
# [1] "/Comgrow-Creality-Ender-Aluminum-220x220x250mm/dp/B07BR3F9N6/ref=zg_bs_6066127011_1/132-1194669-0063960?pd_rd_i=B07BR3F9N6&psc=1"
# [2] "/Comgrow-Creality-Ender-Aluminum-220x220x250mm/dp/B07BR3F9N6/ref=zg_bs_6066127011_1/132-1194669-0063960?pd_rd_i=B07BR3F9N6&psc=1"
# [3] "/product-reviews/B07BR3F9N6/ref=zg_bs_6066127011_cr_1/132-1194669-0063960?pd_rd_i=B07BR3F9N6"
# [4] ......
CodePudding user response:
For (shortened) product links and link texts:
library(rvest)
library(dplyr)
url <- "https://www.amazon.com/Best-Sellers-Industrial-Scientific-3D-Printers/zgbs/industrial/6066127011/ref=zg_bs_pg_1?_encoding=UTF8&pg=1"
web <- rvest::read_html(url)
# "div.p13n-desktop-grid a[tabindex] a" :
# text links are adjacent siblings of image links & image links have tabindex attribute
prod_links <- web %>% html_elements("div.p13n-desktop-grid a[tabindex] a")
tibble(
# shorten links, keep only /pb/item_id/ part
href = prod_links %>% html_attr(name='href') %>% sub('.*(/dp/\\w*/).*','www.amazon.com\\1', .),
descr = prod_links %>% html_text2()
)
#> # A tibble: 30 × 2
#> href descr
#> <chr> <chr>
#> 1 www.amazon.com/dp/B07BR3F9N6/ Official Creality Ender 3 3D Printer Fully Ope…
#> 2 www.amazon.com/dp/B07FFTHMMN/ Official Creality Ender 3 V2 3D Printer Upgrad…
#> 3 www.amazon.com/dp/B09QGTTQKG/ ANYCUBIC Kobra 3D Printer Auto Leveling, FDM 3…
#> 4 www.amazon.com/dp/B07GYRQVYV/ Official Creality Ender 3 Pro 3D Printer with …
#> 5 www.amazon.com/dp/B083GTS8XJ/ ANYCUBIC Wash and Cure Station, Newest Upgrade…
#> 6 www.amazon.com/dp/B09FXYSFBV/ ANYCUBIC Photon Mono 4K 3D Printer, 6.23'' Mon…
#> 7 www.amazon.com/dp/B07J9QGP7S/ ANYCUBIC Mega-S New Upgrade 3D Printer with Hi…
#> 8 www.amazon.com/dp/B07Z9C9T42/ ELEGOO 5PCs FEP Release Film Mars LCD 3D Print…
#> 9 www.amazon.com/dp/B08SPXYND4/ Voxelab Aquila 3D Printer with Full Alloy Fram…
#> 10 www.amazon.com/dp/B07DYL9B2S/ Official Creality Ender 3 S1 3D Printer with D…
#> # … with 20 more rows
Created on 2022-06-16 by the reprex package (v2.0.1)