Home > other >  Python: exclude outer wrapping element when getting content via css/xpath selector
Python: exclude outer wrapping element when getting content via css/xpath selector

Time:07-27

I tried this code to get the HTML content of element div.entry-content:

response.css('div.entry-content').get()

However, it returns the wrapping element too:

<div >
    <p>**my content**</p>
    <p>more content</p>
</div>

But I want just the contents, so in my case: <p>**my content**</p><p>more content</p>

I also tried an xpath selector response.xpath('//div[@]').get(), but with the same result as above.

Based on F.Hoque's answer below I tried:

response.xpath('//article/div[@]//p/text()').getall() and response.xpath('//article/div[@]//p').getall()

These however, returns arrays of respectively all p elements and the content of each found p element. I however want the HTML contents (in a single value) of the div.entry-content element without the wrapping element itself.

I've tried Googling, but can't find anything.

CodePudding user response:

As you said, your main div contains multiple p tags and you want to extract the text node value from those p tags. //p will select all the p tags.

response.xpath('//div[@]//p').getall()

The following expression will remove the array

p_tags = ''.join([x.get() for x in response.xpath('//article/div[@]//p')])

CodePudding user response:

You content is in the <p> tag, not the <div>

response.css('div.entry-content p').get()

or

response.xpath('//div[@]/p').get()
  • Related