Home > Blockchain >  Watir - scraping a grid of items
Watir - scraping a grid of items

Time:09-12

I'm trying to scrape the app URLs from a directory that's laid out in a grid:

<div id="mas-apps-list-tile-grid" >
  <div >
    <div >
       <a href="url.com/app/345">
  <div >
    <div >
       <a href="url.com/app/567">
... and so on

Here are my 2 lines of Watir code that are supposed to create an array with all URLs from a page:

company_listings = browser.div(id: 'mas-apps-list-tile-grid')
companies = company_listings.map { |div| div.a.href }

But instead of an array with URLs, 'companies' returns:

#<Watir::Map: located: false; {:id=>"mas-apps-list-tile-grid", :tag_name=>"div"} --> {:tag_name=>"map"}>

What am I doing wrong?

CodePudding user response:

The #map method for a Watir::Element (or specifically Watir::Div in this case) returns a Watir::Map element. This is used for locating <map> tags/elements on the page.

In contrast, the #map method for a Watir::ElementCollection will iterate over each of the matching elements. This is what is missing.

You have a couple of options. If you want all the links in the grid, the most straightforward approach is to create a #links or #as element collection:

company_grid = browser.div(id: 'mas-apps-list-tile-grid')
company_hrefs = company_grid.links.map { |a| a.href }

If there are only some links you care about, you'll need to use the link's parents to narrow it down. For example, maybe it's just links located in a "solution-tile-content-container" div:

company_grid = browser.div(id: 'mas-apps-list-tile-grid')
company_listings = company_grid.divs(class: 'solution-tile-content-container')
company_hrefs = company_listings.map { |div| div.a.href }
  • Related