Home > Mobile >  Nokogiri to Find All Data Attrabutes Using a Wildcard
Nokogiri to Find All Data Attrabutes Using a Wildcard

Time:12-31

I'd like to strip all the data attributes from img tags while looping through a document. I've tried a few options using has_attribute? and xpath, none have returned true.

article.css('img').each do |img|
  # There is a `data` element
  img.has_attribute?("data-lazy-srcset") # true
  # But I only get `false` or empty arrays when trying wildcards
  img.has_attribute?('data-*') # false
  img.has_attribute?("//*[@*[contains(., 'data-')]]") # false
  img.has_attribute?("//*[contains(., 'data-')]") # false
  img.has_attribute?("//@*[starts-with(name(), 'data-')]") # false
  img.xpath("//*[@*[contains(., 'data-')]]") # []
  img.xpath("//*[contains(., 'data-')]") # []
end

How do I select all data- attributes on these img tags?

CodePudding user response:

You can search for img tags with an attribute that starts with "data-" using the following:

//img[@*[starts-with(name(),'data-')]]

To break this down:

  • // - Anywhere in the document
  • img - img tag
  • @* - All Attributes
  • starts-with(name(),'data-') - Attribute's name starts with "data-"

Example:

require 'nokogiri'

doc = Nokogiri::HTML(<<-END_OF_HTML)
  <img src='' />
  <img data-method='a' src= ''> 
  <img data-info='b' src= ''> 
  <img data-type='c' src= ''> 
  <img src= ''> 
END_OF_HTML

imgs = doc.xpath("//img[@*[starts-with(name(),'data-')]]")

puts imgs 
# <img data-method="a" src="">
# <img data-info="b" src="">
# <img data-type="c" src="">

or using your desired loop

doc.css('img').select do |img|
  img.xpath(".//@*[starts-with(name(),'data-')]").any?
end
#[#<Nokogiri::XML::Element:0x384 name="img" attributes=[#<Nokogiri::XML::Attr:0x35c name="data-method" value="a">, #<Nokogiri::XML::Attr:0x370 name="src">]>, 
# #<Nokogiri::XML::Element:0x3c0 name="img" attributes=[#<Nokogiri::XML::Attr:0x398 name="data-info" value="b">, #<Nokogiri::XML::Attr:0x3ac name="src">]>, 
# #<Nokogiri::XML::Element:0x3fc name="img" attributes=[#<Nokogiri::XML::Attr:0x3d4 name="data-type" value="c">, #<Nokogiri::XML::Attr:0x3e8 name="src">]>]

UPDATE To remove the attributes:

doc.css('img').each do |img|
  img.xpath(".//@*[starts-with(name(),'data-')]").each(&:remove)
end

puts doc.to_s
#<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" #\"http://www.w3.org/TR/REC-html40/loose.dtd\">
#<html>
#<body>
#    <img src=\"\">  
#    <img src=\"\">  
#    <img src=\"\">  
#    <img src=\"\">  
#    <img src=\"\">
#</body>
#</html>

This can be simplified to doc.xpath("//img/@*[starts-with(name(),'data-')]").each(&:remove)

  • Related