Home > Software engineering >  How to extract value from a url
How to extract value from a url

Time:09-14

I have a few requests coming in which follow the pattern below

contacts/id/

contacts/x/id/name

contacts/x/y/id/address

contacts/z/address/

I want to extract the value which follows right after 'contacts'

In above cases,

1. id  
2. x 
3. x 
4. z

Here is my regex

(?<=contacts)\/[^\/] 

https://regex101.com/r/ePmv5Y/1

But it is matching along with the trailing '/' for eg. /id, /x etc

How do I optimize to get rid of this trailing slash?

CodePudding user response:

We can use match() here:

var urls = ["contacts/id/", "contacts/x/id/name", "contacts/x/y/id/address", "contacts/z/address/"];
for (var i=0; i < urls.length;   i) {
    var output = urls[i].match(/\bcontacts\/(.*?)\//)[1];
    console.log(urls[i]   " => "   output);
}

CodePudding user response:

I have a few requests coming in

If you mean http requests, then this is likely the pathname of the requested URL, and they'll start with a /. (This is the value of req.url in a Node.js server.)

To match on a URL pathname, you can use this expression: ^\/contacts\/([^/?] ). Here's a link to another regular expression builder that demonstrates it and includes an explanation for every character: https://regexr.com/6tugf

The [^/?] is a negated set that matches any token which is not a / or a ? and the means that it matches 1 or more of those tokens. It's important to include the ? because otherwise it could match into the query string portion of the URL — for example, in this URL:

https://domain.tld/contacts/x/id/name?filter=recent # URL
                  /contacts/x/id/name?filter=recent # req.url in Node.js
                  /contacts/x/id/name               # pathname
                                     ?filter=recent # query string

And here's a runnable code snippet demonstrating the same expression, using String.prototype.match():

const contactIdRegexp = /^\/contacts\/([^/?] )/;

const inputs = [
  '/contacts/id/', // id
  '/contacts/x/id/name', // x
  '/contacts/x/y/id/address', // x
  '/contacts/z/address/', // z
  '/contacts/x/id/name?filter=recent', // x
];

for (const str of inputs) {
  const id = str.match(contactIdRegexp)?.[1];
  console.log(id);
}

CodePudding user response:

If you like to continue without regex, You can try below.

//get the URL object.
const url = new URL(`${req.protocol}://${req.get('host')}${req.originalUrl}`);

//extract the pathname and split using "/"
const pathName= url.pathname.split("/");

//get the required value using array index.
const val = pathName[2];

CodePudding user response:

You can add the / inside the lookbehind:

(?<=contacts\/)[^\/] 

See a regex demo.

  • Related