I have a url where I have to scrape all images using mechanize
gem, but some image url's are in rel=icon
.
I have to get the image from this url:
<link rel="icon" href="https://mywebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png" sizes="32x32">
This is my code I tried which scrapes only images. How to get both working as one.
require 'mechanize'
url = "https://mywebsite.com/"
agent = Mechanize.new
page = agent.get(url)
page.images.each do |image|
puts image #getting here all images here from image tag
end
CodePudding user response:
I looked over Mechanize Page Link but it returns only the anchors
.
Tried it with xpath
page.xpath('//link[contains(@rel, "icon")]').each do |icon|
p icon.attr('href')
end
And received:
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-192x192.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-180x180.png"
Here is a Replit that returns all the images.
CodePudding user response:
page.search('link').each do |link|
if link['href'].to_s.include?(".gif") or link['href'].to_s.include?(".png") or link['href'].to_s.include?(".jpg") or link['href'].to_s.include?(".jpeg")
puts link['href']
end
end