I'm writing a PHP application where the user can enter a URL and some operations take place afterwards (further details not relevant to this question).
Requirement: If the user enters example.com
, it should be converted to http://www.example.com
.
The http://
part is straight-forward but am struggling with the rules that determine whether www.
is prepended. Since the URL could be anything that might work in a web browser, it could be localhost
or 192.168.0.1
for example. For these, clearly www.
shouldn't be prepended.
So the exclusion list from above is: "If the host is localhost
or looks like a v4 IP address, don't prepend". But expect there will be other cases that need to covered - could anyone advise - or suggest an alternative way of approaching this?
CodePudding user response:
You can validate the user input to IP and decide whether to concatenate the "www" or not. The user input can be "127.0.0.1", "127.0.0.1:8080","http://127.0.0.1:8080' or "http://exaple.com:8080".
$input = ("127.0.0.1:8080");
[$host,$port] = explode(":",trim($input,"http://"));
if(!empty($port)){
$port=":".$port;
}
if (filter_var($host, FILTER_VALIDATE_IP)) {
header("location:http://$host$port");
} else {
header("location:www.$host$port");
}
CodePudding user response:
Here is my current attempt at doing this. It makes two passes because parse_url
initially puts links without a scheme such as google.com
or www.google.com
into the "path" part rather than the "host" part.
function preprocessAbsoluteUrl($url, $firstRun = true) {
$parts = parse_url($url);
$scheme = isset($parts['scheme']) ? $parts['scheme'] : 'http';
$user = isset($parts['user']) ? $parts['user'] : '';
$pass = isset($parts['pass']) ? ":{$parts['pass']}" : '';
$userpass = $user !== '' || $pass !== '' ? "{$user}{$pass}@" : '';
$host = isset($parts['host'])
? (preg_match('/^(?:localhost|www\.|\d{1,3}\.\d{1,3}\d{1,3}\.\d{1,3})/i',
$parts['host'])
? $parts['host']
: "www.{$parts['host']}")
: '';
$port = isset($parts['port']) ? ":{$parts['port']}" : '';
$path = isset($parts['path']) ? rtrim($parts['path'], '/') : '';
$query = isset($parts['query']) ? "?{$parts['query']}" : '';
$fragment = isset($parts['fragment']) ? "#{$parts['fragment']}" : '';
$url = "{$scheme}://{$userpass}{$host}{$port}{$path}{$query}{$fragment}";
if ($firstRun) {
$url = preprocessAbsoluteUrl($url, false);
}
return $url;
}
The relevant part is the setting of $host
: This currently uses a regular expression to only prepend www.
when it doesn't begin with www.
, localhost
or look like an IP address. Open to improvement suggestions!