How can I extract the "wp-*" portion of the URL in regex?
facebook.com/wp-content/uploads/xyz
facebook.com/wp-uploads/uploads/xyz
CodePudding user response:
This depend a bit on the language you use and how flexible the regex should be.
A very generic regex could look like this .*/(wp-[^/] ).*
In JavaScript a code could look like this
const url = 'facebook.com/wp-content/uploads/xyz';
const folder = url.replace(/.*\/(wp-[^\/] ).*/, '$1');
CodePudding user response:
This can extract "content" and "uploads" from your example:
/(?<=/wp-)[a-zA-Z0-9] /g
The first part is called "positive lookbehind" (?<=/wp-)
it only starts extracting from the point that follows the /wp-
character sequence. The second part [a-zA-Z0-9]
sets what kind of characters we expect to read. I've added lowercase letters, uppercase letters, and numbers.
If these keywords can contain any other characters e.g: "-", "_", you can add those to the rule like this:
/(?<=/wp-)[a-zA-Z0-9-_] /g
The Global Switch at the end means that you want to check the whole text that can contain multiple matches to the rule.
Edit:
If you want to read wp-content
and wp-uploads
, you can move wp-
out of the lookbehind part, like this:
/(?<=/)wp-[a-zA-Z0-9-_] /g
It will read something that follows a /
and starts with wp-
. Do not add /
as a special character to the second part, because that will ruin it.