Home > other >  Retrieving only XML Tag Names in Scrapy
Retrieving only XML Tag Names in Scrapy

Time:10-03

The Short:

How can I retrieve only tag names with .xpath() in Scrapy?

The Long:

I am currently using a scrapy.Spider and using response.selector.remove_namespaces() in the parse() function to keep things simple.

I am trying to do something like this, but with Scrapy:

Iterate on XML tags and get elements' xpath in Python

However, I can't seem to figure out how to retrieve only the name of the tags. What is the .xpath() command to grab just the tag names?

CodePudding user response:

There is no built in way of extracting just the tag name from a scrapy.selector class, at least that I am aware of. And the xpath method is used to extract data from the markup when the xpath is already known, not to get the xpath itself.

All that being said, you can use the re method of any selector and use a regular expression pattern to extract the tag name.

For example:

for selector in response.xpath("//*"):
    print(selector.re(r'<(\w )\s'))
  • Related