Home > database >  Remove qutation mark sring from URL with HTACCESS
Remove qutation mark sring from URL with HTACCESS

Time:05-04

We are seeing a strang thing where bots are sending odd URLs. They are adding an alexa URL in the url we have. We are looking to remove that part of the URL so it just has everything before the odd URL addition

So we want to go from

www.example.com/search/Linux/page/6/”http:/www.alexa.com/siteinfo/www.example.com“/page/900

to

www.example.com/search/Linux/page/6/

removing the: ”http:/www.alexa.com/siteinfo/www.example.com“/page/900

Due to it having the quotes, we I am unsure what htaccess rule would work to rewrite the URL, but am open to suggestions.

CodePudding user response:

Not sure where the requests are coming from, only see them with our 404 monitor.

If these requests are triggering a 404 (as they should be) then you are essentially already "blocking" such requests - they won't get inadvertently indexed by search engines.

However, if a third party side is mistakenly linking to you with these erroneous links then you might be losing traffic. You can redirect to remove the erroneous portion of the URL.

Due to it having the quotes, we I am unsure what htaccess rule would work to rewrite the URL, but am open to suggestions.

There's nothing particularly special about matching quotes in the URL. However, the quotes used in your question are not the "standard" double-quotes. The opening quote is "U 201D: RIGHT DOUBLE QUOTATION MARK" and closing with "U 201C: LEFT DOUBLE QUOTATION MARK". This is not a problem, we can check for all three.

For example, using mod_rewrite at the top of the .htaccess file to remove the part of the URL from the first quote character onwards:

RewriteEngine On

# Remove everything from the first double quote onwards
RewriteRule ^([^"”“] )["”“] /$1 [R=301,L]

The $1 backreference contains the part of the URL-path before the first double quote character.

The original query string (if any) is preserved.

Test first with a 302 (temporary) redirect to avoid potential caching issues.

Alternatively, if your URLs are limited to a known subset of characters, eg. a-z, A-Z, 0-9, _ (underscore), - (hyphen), / (slash - path separator) then check for valid chars instead. For example:

# Remove everything from the first "invalid character"
RewriteRule ^([\w-/] )[^\w-/] /$1 [R=301,L]
  • Related