Home > Software engineering >  Mechanize::ResponseCodeError (404 => Net::HTTPNotFound unhandled response):
Mechanize::ResponseCodeError (404 => Net::HTTPNotFound unhandled response):

Time:11-25

Trying to scrap images from https://en.wikipedia.org/ website using mechanize gem. I am getting Mechanize::ResponseCodeError (404 => Net::HTTPNotFound for https://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/FP2A3620_%2823497688248%29.jpg/119px-FP2A3620_%2823497688248%29.jpg -- unhandled response): for this when i try to calculate image size.

Here is my code

         def images
          agent = Mechanize.new
          page = agent.get("https://en.wikipedia.org/")
          page.images.each do |image|
            puts image.url
            size = agent.head( image )["content-length"].to_i/1000
          end  
       end

Any help is appreciated.

CodePudding user response:

Looked after that image on wikipedia and it renders just fine. Opened it in a new tab and compared the url from the browser to what mechanize has.

Unescaping the url, did the trick.

image_url = CGI.unescape(image.url.to_s)
size = agent.head(image_url)["content-length"].to_i/1000

Here is a working Replit.

  • Related