Home > Software design >  PHP URL Param redirects - with wildcards/regex
PHP URL Param redirects - with wildcards/regex

Time:11-09

I recently found this solution for doing php url variable to header location redirects.

It's so much more manageable compared to htaccess for mass redirects, however one thing I want to next work out, how I can use regex to achieve what you can do with htaccess where request/(.*) goes to destination/$1.

My grasping is on that you use preg_match or preg_replace or something. How can I achieve something like the below, preferably keeping it short like this if possible. (I know this is wrong btw, just for example sake).

preg_match($redirects['request(.*)'] = "$domain/destination/\1");

Basically to break it down, say I want to redirect doma.in/pics to domain.tld/pictures, I have htaccess redirect that to doma.in/index?req=pics, where index is the script file and req is the parameter used.

And then using the script I have a line like $redirects['pics'] = "$domain/pictures";, where $domain variable is tied to http://domain.tld.

That works great, however I want to take this a step further with regex and send anything pics/*stuff*, aka doma.in/index?req=pics/*stuff* to $domain/pictures/*stuff*.

Here's an example of how things look using this script for doing many redirects.

$redirects['request'] = "$domain/dest";
$redirects['request2'] = "$domain/dest2";
$redirects['request3'] = "$domain/dest3";

Even though I've linked the post at the top that I got the script I'm using, here's the code:

if(isset($_GET['req']) && isset($redirects[$_GET['req']])) {
    $loc = htmlspecialchars($redirects[$_GET['req']]);
    header("Location: " . $loc);
    exit();
}
    header("Location: $domain");

With the $redirects lines being included above this, which I have in included files.

CodePudding user response:

I thought that ltrim() was what I wanted, seeing on other answers that if for example I specify 0 as what to remove, 01 will become 1, 001 will become 01, and 10 will be left as 10, 100 as 100 and so on. However this was not turning out to be the case. and instead it would strip all instances of the stated characters. Though it wasn't doing it with the slash, so confused.

This however does it correctly:

if (strpos($get, $req) === 0) {
    $get = substr($get, strlen($req));
}
return $get;

Thanks to this answer for this one liner.

All I'm doing here with this script is just assigning $redirects['request'] to the associated value, like with any other variable value assignments. And the $_GET['req'] already does the job to get well whatever the parameter is, so no complicated preg or regex or anything.

So with that substr(), we can take the $_GET['req'] and do the following:

$req = "pics/";
$get = $_GET['req'];
$wild = strpos($get, $req) === 0
    ? substr($get, strlen($req))
    : $get;

$redirects["pics/$wild"] = "$domain/pictures/$wild";

This takes pics/*stuff* and removes the pics/ so the value of $wild equals just *stuff*, and so I just use that in the redirect to make a wildcard and taadaa.

This is completely functional, but let's make this even better to save remembering this code each time which is a fair bit.

Create a function like this above the redirects:

function wildcard($req) {
    $get = $_GET['req'];
    return strpos($get, $req) === 0
        ? substr($get, strlen($req))
        : $get;
}

By calling wildcard('pics/');, the $req equals pics/.

We can use this in redirects like:

$req = "pics/";
$wild = wildcard($req);
$redirects[$req.$wild] = "$domain/pictures/$wild";

It's still a bit more than I hoped for, so the idea I've had is to call $req as a global in the function, like this:

function wild() {
    $get = $_GET['req']; global $req;
    return strpos($get, $req) === 0
        ? substr($get, strlen($req))
        : $get;
}

And then do the redirect like:

$req = "pics/";
$redirects[$req.wild()] = "$domain/pictures/".wild();

That becomes a much shorter single line. Though with the conflict around using globals, I've just put it back to as before but instead of repeatedly assigning $wild, just put $req back inside wild() and have it be like:

$req = "pics/";    $redirects[$req.wild($req)] = "$domain/pictures/".wild($req);

It's still shorter anyway and isn't much to it over the brackets being empty.

P.S, This method, you want to include the trailing slash on the parameter so results don't get messy. In order to achieve to be able to send pics to the $domain/pictures, we want to have a trailing slash at the end of the parameter. In your redirect rule in htaccess to send requests as a parameter to the script, add a trailing slash on the end. So if you're using Apache or Litespeed, you can do the following in htaccess to send all requests to your script as a parameter with the trailing slash like:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?req=$1/ [R=301,L]

Make sure this is at the bottom so it doesn't take priority over other rules.

Also added a precautionary rtrim() to the script to remove the trailing slash in the header location, so if you want to link anything that doesn't remove trailing slashes on file links, it doesn't go to a dead link. As again the slashes weren't being effected by what behaviour I discovered as mentioned at top, this is fine here.

Here is how you can have things now.

function wild($req) {
    $get = $_GET['req'];
    return strpos($get, $req) === 0
        ? substr($get, strlen($req))
        : $get;
}

$domain = "http://domain.tld";

// Redirects
$req = "request1/";    $redirects[$req.wild($req)] = "$domain/dest1/".wild($req);
$req = "request2/";    $redirects[$req.wild($req)] = "$domain/dest2/".wild($req);
$req = "request3/";    $redirects[$req.wild($req)] = "$domain/dest3/".wild($req);

// Run Script
if (isset($_GET['req'], $redirects[$_GET['req']])) {
    $loc = htmlspecialchars($redirects[$_GET['req']]);
    header("Location: " . rtrim($loc,"/"));
    exit();
}

// If no match in the redirects, redirect to this location.
header("Location: $domain");

Now, this has one flaw if the destination is sending non existent requests to the script, if a destination, which is going to be guaranteed with wildcards, is non existent for the request, well.back it goes to.the script and bam you have a redirect loop.

My way of solving this is to add ?referer=doma.in to the end of the header location, and in the htaccess on domain.tld, exclude non existent requests with that query string from redirecting back to the script.

So that looks like:

$loc = htmlspecialchars($redirects[$_GET['req']]).'?referer=doma.in';

And in the htaccess of domain.tld, place a rewritecond above the existing rule like so to exclude the query string:

# Ignore these referer queries
RewriteCond %{QUERY_STRING} !referer=doma.in [NC]

# Send dead requests to doma.in with uri as query
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ http://doma.in/?referer=domain.tld&req=$1/ [R=301,L]

For good measure I also added a referer on the redirect for the domain.tld.

Now, as a bonus, to hide the refer query on requests to tidy things up, let's add below:

# Send dead requests with referer query to home (or 404 or wherever)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{QUERY_STRING} "referer=" [NC]
RewriteRule (.*) /?req=$1 [R=301,L]

# Remove referer query from requests
RewriteCond %{QUERY_STRING} "referer=" [NC]
RewriteRule (.*) /$1/? [R=301,L]

We need to send dead referer query requests somewhere before we remove the query, otherwise we'd be back at step one. I have the dead requests sent to my homepage with the request uri as a parameter so that can still know what the request url was.

And job done. But as an extra bonus, let's make external/non wildcard redirects not have a query. So back in the script, change the script to be like so:

$get = $_GET['req'];
$loc = $redirects[$get];
$wildloc = $wildcards[$get];

// Run Script
if(isset($get) && isset($loc) || isset($wildloc)) {
    if(isset($wildcards[$get])) {
    $loc = rtrim($wildloc,'/').'?referer=hwl.li'; }
    $loc = rtrim(htmlspecialchars($loc),'/');
    header("Location: ".$loc);
    exit();
}

Here, I've moved things about with the $_GET['req'] assigned to $get, $redirects[$get] assigned as $loc $wildcards[$get]assigned to$wildloc, and call them in the issets, along with an extra isset after an :, aka ORfor$wildloc`.

And then have an if statement so $wildcards redirects use $loc assigned to $wildloc, and $redirects ones use the above one.

This way, we can have tidy redirects.

So now things look like:

// Wildcard function
function wild($req) {
    $get = $_GET['req'];
    return strpos($get, $req) === 0
        ? substr($get, strlen($req))
        : $get;
}

$domain = "http://domain.tld";

// Redirects
$req = "request1/";   $wildcards[$req.wild($req)] = "$domain/dest1/".wild($req);  // A wildcard redirect
$req = "request2/";    $wildcards[$req.wild($req)] = "$domain/dest2/".wild($req);  // A wildcard redirect
$redirects['request3/'] = "$domain/dest3/"; // Not a wildcard redirect

$get = $_GET['req'];
$loc = $redirects[$get];
$wildloc = $wildcards[$get];

// Run Script
if(isset($get) && isset($loc) || isset($wildloc)) {
    if(isset($wildcards[$get])) {
    $loc = rtrim($wildloc,'/').'?referer=hwl.li';}
    $loc = rtrim(htmlspecialchars($loc),'/');
    header("Location: ".$loc);
    exit();
}

// If no match in the redirects, redirect to this location.
header("Location: $domain/?req=$get");

This improves things so much and solves the redirect loop.

Edited this again slightly as what I did here with the query being appended.. the rtrim() therefore was looking for a non existent trailing lash after that, not where we wanted it to be doing it, before. So now the rtrim() comes before. Doubles it up which is slightly annoying but at least it does the job right now.

CodePudding user response:

As my other answer is extremely long, here is a TLDR version.

This is the script as things are now:

// Wildcard function
function wild($req) {
    $get = $_GET['req'];
    return strpos($get, $req) === 0
        ? substr($get, strlen($req))
        : $get;
}

$domain = "http://domain.tld";

// Redirects
$req = "request1/";   $wildcards[$req.wild($req)] = "$domain/dest1/".wild($req);  // A wildcard redirect
$req = "request2/";    $wildcards[$req.wild($req)] = "$domain/dest2/".wild($req);  // A wildcard redirect
$redirects['request3/'] = "$domain/dest3/"; // Not a wildcard redirect

// Run Script
$get = $_GET['req'];
$loc = $redirects[$get];
$wildloc = $wildcards[$get];

if(isset($get) && isset($loc) || isset($wildloc)) {
    if(isset($wildcards[$get])) {
    $loc = rtrim($wildloc,'/').'?referer=hwl.li';}
    $loc = rtrim(htmlspecialchars($loc),'/');
    header("Location: ".$loc);
    exit();
}

// If no match in the redirects, redirect to this location.
header("Location: $domain/?req=$get");

What we are doing here: The wildcard function removes the value of $req from the request parameter. We call this in a redirect like wild($req);.

Next for example sake, $domain tied to http://domain.tld.

Then we have the redirects. Here is two wildcard redirects and one non wildcard redirect. Ensure you have a trailing slash at the end of the $req value, because that'll lead to disastrous results with wildcards otherwise.

Next, we have $get assigned to $_GET['req'], and have $loc and $wildloc assigned accordingly to $requests[$get] and $wildcards[$get].

Then we have an if statement checking that there is a ?req= parameter, and that there is a matching $redirects['...'], OR a matching $wildcards['...'].

Inside that, we have another if statement to assign $loc to wildloc, use rtrim() to remove the trailing slash, and add a ?referer=doma.in query string added on the end.

Then we use rtrim again on $loc so if it's$wildloc turned into $loc, then it ignores, but if it's the original $loc, then remove the trailing slash. And we also use htmlspecialchars(), which I'm not totally sure if it really is needed when we aren't printing the user added string of the request parameter onto the page, but I've added that for in case.

And then we have the header location, to direct the request to the associated place.

And if there is no match, we redirect home to http://domain.tld?req=request.

For htaccess for doma.in, we have at the end:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /?req=$1/ [R=301,L]

This directs http://doma.in/request to http://doma.in/?req=request/ with the trailing slash, and http://doma.in/request/folder to http://doma.in/?req=request/folder/

To prevent a redirect loop when there's guaranteed dead requests with wildcards and the destination domain.tld is redirecting dead requests back to the script, let's add to the bottom of domain.tld htaccess the below lines so that it will not send dead requests with ?referer=doma.in parameters back to the script, but home instead.

# Ignore these referer queries
RewriteCond %{QUERY_STRING} !referer=doma.in [NC]

# Send dead requests to hwl.li with uri as query
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ http://hwl.li/?referer=domain.tld&req=$1/ [R=301,L]

# Send dead requests with referer query to home
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{QUERY_STRING} "referer=" [NC]
RewriteRule (.*) /?req=$1 [R=301,L]

# Remove referer query from requests
RewriteCond %{QUERY_STRING} "referer=" [NC]
RewriteRule (.*) /$1/? [R=301,L]

And this is all job done! Well Apache/Litespeed wise anyway.

  • Related