I'd like to strip all the data attributes from img
tags while looping through a document. I've tried a few options using has_attribute?
and xpath
, none have returned true
.
article.css('img').each do |img|
# There is a `data` element
img.has_attribute?("data-lazy-srcset") # true
# But I only get `false` or empty arrays when trying wildcards
img.has_attribute?('data-*') # false
img.has_attribute?("//*[@*[contains(., 'data-')]]") # false
img.has_attribute?("//*[contains(., 'data-')]") # false
img.has_attribute?("//@*[starts-with(name(), 'data-')]") # false
img.xpath("//*[@*[contains(., 'data-')]]") # []
img.xpath("//*[contains(., 'data-')]") # []
end
How do I select all data-
attributes on these img
tags?
CodePudding user response:
You can search for img tags with an attribute that starts with "data-" using the following:
//img[@*[starts-with(name(),'data-')]]
To break this down:
- // - Anywhere in the document
- img - img tag
- @* - All Attributes
- starts-with(name(),'data-') - Attribute's name starts with "data-"
Example:
require 'nokogiri'
doc = Nokogiri::HTML(<<-END_OF_HTML)
<img src='' />
<img data-method='a' src= ''>
<img data-info='b' src= ''>
<img data-type='c' src= ''>
<img src= ''>
END_OF_HTML
imgs = doc.xpath("//img[@*[starts-with(name(),'data-')]]")
puts imgs
# <img data-method="a" src="">
# <img data-info="b" src="">
# <img data-type="c" src="">
or using your desired loop
doc.css('img').select do |img|
img.xpath(".//@*[starts-with(name(),'data-')]").any?
end
#[#<Nokogiri::XML::Element:0x384 name="img" attributes=[#<Nokogiri::XML::Attr:0x35c name="data-method" value="a">, #<Nokogiri::XML::Attr:0x370 name="src">]>,
# #<Nokogiri::XML::Element:0x3c0 name="img" attributes=[#<Nokogiri::XML::Attr:0x398 name="data-info" value="b">, #<Nokogiri::XML::Attr:0x3ac name="src">]>,
# #<Nokogiri::XML::Element:0x3fc name="img" attributes=[#<Nokogiri::XML::Attr:0x3d4 name="data-type" value="c">, #<Nokogiri::XML::Attr:0x3e8 name="src">]>]
UPDATE To remove the attributes:
doc.css('img').each do |img|
img.xpath(".//@*[starts-with(name(),'data-')]").each(&:remove)
end
puts doc.to_s
#<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" #\"http://www.w3.org/TR/REC-html40/loose.dtd\">
#<html>
#<body>
# <img src=\"\">
# <img src=\"\">
# <img src=\"\">
# <img src=\"\">
# <img src=\"\">
#</body>
#</html>
This can be simplified to doc.xpath("//img/@*[starts-with(name(),'data-')]").each(&:remove)