Retrieving Part of a URL (Regex)-CodePudding

How can I extract the "wp-*" portion of the URL in regex?

facebook.com/wp-content/uploads/xyz 
facebook.com/wp-uploads/uploads/xyz

CodePudding user response：

This depend a bit on the language you use and how flexible the regex should be. A very generic regex could look like this .*/(wp-[^/] ).* In JavaScript a code could look like this

const url = 'facebook.com/wp-content/uploads/xyz';
const folder = url.replace(/.*\/(wp-[^\/] ).*/, '$1');

CodePudding user response：

This can extract "content" and "uploads" from your example:

/(?<=/wp-)[a-zA-Z0-9] /g

The first part is called "positive lookbehind" (?<=/wp-) it only starts extracting from the point that follows the /wp- character sequence. The second part [a-zA-Z0-9] sets what kind of characters we expect to read. I've added lowercase letters, uppercase letters, and numbers.

If these keywords can contain any other characters e.g: "-", "_", you can add those to the rule like this:

/(?<=/wp-)[a-zA-Z0-9-_] /g

The Global Switch at the end means that you want to check the whole text that can contain multiple matches to the rule.

Edit:

If you want to read wp-content and wp-uploads, you can move wp- out of the lookbehind part, like this:

/(?<=/)wp-[a-zA-Z0-9-_] /g

It will read something that follows a / and starts with wp-. Do not add / as a special character to the second part, because that will ruin it.