I'm trying to scrape the app URLs from a directory that's laid out in a grid:
<div id="mas-apps-list-tile-grid" >
<div >
<div >
<a href="url.com/app/345">
<div >
<div >
<a href="url.com/app/567">
... and so on
Here are my 2 lines of Watir code that are supposed to create an array with all URLs from a page:
company_listings = browser.div(id: 'mas-apps-list-tile-grid')
companies = company_listings.map { |div| div.a.href }
But instead of an array with URLs, 'companies' returns:
#<Watir::Map: located: false; {:id=>"mas-apps-list-tile-grid", :tag_name=>"div"} --> {:tag_name=>"map"}>
What am I doing wrong?
CodePudding user response:
The #map
method for a Watir::Element
(or specifically Watir::Div
in this case) returns a Watir::Map
element. This is used for locating <map>
tags/elements on the page.
In contrast, the #map
method for a Watir::ElementCollection
will iterate over each of the matching elements. This is what is missing.
You have a couple of options. If you want all the links in the grid, the most straightforward approach is to create a #links
or #as
element collection:
company_grid = browser.div(id: 'mas-apps-list-tile-grid')
company_hrefs = company_grid.links.map { |a| a.href }
If there are only some links you care about, you'll need to use the link's parents to narrow it down. For example, maybe it's just links located in a "solution-tile-content-container" div:
company_grid = browser.div(id: 'mas-apps-list-tile-grid')
company_listings = company_grid.divs(class: 'solution-tile-content-container')
company_hrefs = company_listings.map { |div| div.a.href }