I'm trying to scrape a picture using rvest
, with this code:
url <- "https://fr.wikipedia.org/wiki/Robert_Jardillier"
webpage <- html_session(url)
link.titles <- webpage %>% html_nodes(".noarchive .image img")
img.url <- link.titles %>% html_attr("src")
download.file(img.url, "test.png", mode = "wb")
But when trying to download this, I have the following message :
trying URL '//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Robert_Jardillier_1932.jpg/220px-Robert_Jardillier_1932.jpg'
Error in download.file(img.url, "test.png", mode = "wb") :
cannot open URL '//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Robert_Jardillier_1932.jpg/220px-Robert_Jardillier_1932.jpg'
In addition: Warning message:
In download.file(img.url, "test.png", mode = "wb") :
URL '//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Robert_Jardillier_1932.jpg/220px-Robert_Jardillier_1932.jpg': status was 'URL using bad/illegal format or missing URL'
Any help :) ?
CodePudding user response:
Try:
download.file(paste0("http:",img.url), "test.png", mode = "wb")
CodePudding user response:
This worked with me.
suppressPackageStartupMessages({
library(rvest)
library(dplyr)
})
url <- "https://fr.wikipedia.org/wiki/Robert_Jardillier"
page <- read_html(url)
page %>%
html_elements("a") %>%
html_attr("href") %>%
grep("Robert_Jardillier.*\\.jpg", ., value = TRUE) %>%
unique() %>%
basename() %>%
paste0(url, "#/media/", .) %>%
download.file(destfile = "test.jpg")