Looking to segment out a series of URLs from Screaming frog cawl. The URLs I want to exclude all include "p-#" (page number). Ex. I want to capture only the first two URLs in this list. Thank you in advance!
https://regex101.com/r/QIU4R2/1
CodePudding user response:
It should be as simple as:
.*-p-[0-9] .*
.*
is any character except new lines-p-
matches a literal "-p-"[0-9]
is at least one digit
I'm not sure exactly how this exclusion match in Screaming Frog works, but it seems likely that you don't need to match full URLs. My guess is that a regular expression for just the pagination portion of the URL would be enough:
-p-[0-9]