Home > database >  Scraping image URL using R over multiple pages
Scraping image URL using R over multiple pages

Time:12-01

I'm trying to scrape URLs of images over multiple pages but my code seems to only scrape the URLs belonging to the first page.

My goal is to manipulate the website URL such that it loops between pages 1 to 100 (after the "page=" portion) and the URL links are scraped accordingly!

Would appreciate some assistance! Thank you!

I've attached my code below;

library("rvest")
library("ralger")

  for(page_result in 1:100){
    link = paste0("https://www.istockphoto.com/search/2/image?alloweduse=availableforalluses&mediatype=photography&phrase=man&page=", page_result)
    male <- images_preview(link)
  }

CodePudding user response:

You should create a list to hold the output at each step of the for loop.

library("rvest")
library("ralger")

num_pages <- 5
male <- vector("list", num_pages)

for(page_result in 1:num_pages){
    link = paste0("https://www.istockphoto.com/search/2/image?alloweduse=availableforalluses&mediatype=photography&phrase=man&page=", page_result)
    male[[page_result]] <- images_preview(link)
}

male <- unlist(male)

CodePudding user response:

library(tidyverse)
library(rvest)

get_url <- function(n_page) {
  page <- str_c(
    "https://www.istockphoto.com/search/2/image?alloweduse=availableforalluses&mediatype=photography&phrase=man&page=",
    n_page
  ) %>% read_html()
  
  tibble(
    index_page = n_page, 
    url = page %>% 
      html_elements(".MosaicAsset-module__thumb___klD9E") %>%
      html_attr("src")
  )
} 

map_dfr(1:10, get_url)

# A tibble: 600 x 2
   index_page url                                                                                                                            
        <int> <chr>                                                                                                                          
 1          1 https://media.istockphoto.com/id/1336324740/photo/having-fun-at-a-garden-party.jpg?s=612x612&w=0&k=20&c=r5iNGwCyH-6ENsCVz7FuyD~
 2          1 https://media.istockphoto.com/id/1373887893/photo/sugarland-run-stream-valley-trail-hike-in-herndon-fairfax-county-in-virginia~
 3          1 https://media.istockphoto.com/id/164853384/photo/father-and-son-admiring-each-other-at-table.jpg?s=612x612&w=0&k=20&c=gxgST5dQ~
 4          1 https://media.istockphoto.com/id/931693472/photo/handsome-attractive-glad-positive-funny-guy-in-glasses-with-wide-open-mouth-f~
 5          1 https://media.istockphoto.com/id/890112744/photo/young-man-having-fun-cleaning-house-with-vacuum-cleaner-dancing-like-guitaris~
 6          1 https://media.istockphoto.com/id/1391217782/photo/full-length-portrait-of-a-mature-man-wearing-a-denim-shirt-and-jeans.jpg?s=6~
 7          1 https://media.istockphoto.com/id/640312968/photo/you-can-never-watch-too-many-cat-videos.jpg?s=612x612&w=0&k=20&c=x_aZqpQM0sor~
 8          1 https://media.istockphoto.com/id/1334702614/photo/young-man-with-laptop-and-coffee-working-indoors-home-office-concept.jpg?s=6~
 9          1 https://media.istockphoto.com/id/1337995262/photo/portrait-of-adult-bald-smiling-attractive-man-forty-years-with-beard-in-blue~
10          1 https://media.istockphoto.com/id/1220736928/photo/a-highly-emotional-man.jpg?s=612x612&w=0&k=20&c=RS2E06G34Ja36vCn-1Cjdh7zkdYE~
# ... with 590 more rows
# i Use `print(n = ...)` to see more rows
  •  Tags:  
  • r
  • Related