Python/Selenium: Any way to wildcard the end of an xpath? Or search for a specifically formatted pie-CodePudding

I am using python / selenium to archive some posts. They are simple text images. As the site requires a login, I'm using selenium to access it.

The problem is, the page shows all the posts, and they are only fully readable on clicking a text labeled "read more", which brings up a popup with the full text / images.

So I'm writing a script to scroll the page, click read more, scrape the post, close it, and move on to the next one.

The problem I'm running into, is that each read more button is an identical element:

<a href="javascript:;" style="font-weight: 400">read more</a>

If I try to loop through them using XPaths, I run into the problem of them being formatted differently as well, for example:

//*[@id="page"]/div[2]/article[10]/div[2]/ul/li/a

//*[@id="page"]/div[2]/article[14]/div[2]/p[3]/a

I tried formatting my loop to just loop through the article numbers, but of course the xpath's terminate differently. Is there a way I can add a wildcard to the back half of my xpaths? Or search just by the article numbers?

CodePudding user response：

/ is used to go for direct child, use // instead to go from <article> to the <a>

//*[@id="page"]/div[2]/article//a[.="read more"]

This will give you a list of elements you can iterate. You might be able to remove the [.="read more"], but it might catch unrelated <a> tags, depends on the rest of the html structure.

You can also try looking for the read more elements directly by text

//a[.="read more"]