The Short:
How can I retrieve only tag names with .xpath() in Scrapy?
The Long:
I am currently using a scrapy.Spider and using response.selector.remove_namespaces()
in the parse()
function to keep things simple.
I am trying to do something like this, but with Scrapy:
Iterate on XML tags and get elements' xpath in Python
However, I can't seem to figure out how to retrieve only the name of the tags. What is the .xpath()
command to grab just the tag names?
CodePudding user response:
There is no built in way of extracting just the tag name from a scrapy.selector
class, at least that I am aware of. And the xpath
method is used to extract data from the markup when the xpath is already known, not to get the xpath itself.
All that being said, you can use the re
method of any selector and use a regular expression pattern to extract the tag name.
For example:
for selector in response.xpath("//*"):
print(selector.re(r'<(\w )\s'))