Home > Software engineering >  HTACCESS 301 : How to redirect all urls to HTTPS except spammy urls with a specific character?
HTACCESS 301 : How to redirect all urls to HTTPS except spammy urls with a specific character?

Time:01-17

I posted a question one month ago with great answers (HTACCESS 403 : How to block URL with a specific character?) : HTACCESS 403 : How to block url with a specific character?

The problem is, I migrated my website HTTP to HTTPS and I would like to redirect all urls, except spammy urls whith a specific caracter that I would block with 410 code.

Exemple what I would like :

http://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html ==> 410 code, without 301 to HTTPS
http://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 ==> 410 code, without 301 to HTTPS
http://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 ==> 410 code,  without 301 to HTTPS

Wrong, today, the spammy urls have a 301 code, and then a 410 code

http://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html ==> 301 to https://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html and then ==> 410.
http://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 ==> 301 to
https://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 and then ==> 410.
http://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 ==> 301 to
https://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 and then ==> 410.

I'm using these rules :

RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(. )$ [NC]
RewriteRule ^.*$ https://www.%1%{REQUEST_URI} [L,NE,R=301]

RewriteEngine On
RewriteCond %{QUERY_STRING} ^vn/ [NC]
RewriteRule ^ - [R=410]

RewriteEngine On
RewriteCond %{THE_REQUEST} /webhook.php [NC]
RewriteRule ^ - [R=410]

RewriteEngine On
RewriteCond %{THE_REQUEST} /football.php [NC]
RewriteRule ^ - [R=410]

Do you have an idea to manage the 301 redirection except URLs with a specific character / string pages.

CodePudding user response:

Just reverse the order of the rules, so your blocking directives are first (as they should be).

There is also no need to repeat the RewriteEngine directive.

Instead of using THE_REQUEST server variable (which is perhaps matching too much in the context you are using it), you should just use the RewriteRule pattern (or even combine the rules into one).

For example:

RewriteEngine On

# Blocking the following requests
RewriteCond %{QUERY_STRING} ^vn/ [NC]
RewriteRule ^ - [R=410]

RewriteRule /webhook\.php$ - [NC,R=410]

RewriteRule /football\.php$ - [NC,R=410]


# Canonical redirect
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(. )$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [L,NE,R=301]

Note also that I simplified the regex ^.*$ in the last rule to just ^.

The 3 blocking rules can be combined into one (but does not really serve any benefit to do so). For example:

# Blocking the following requests (combined rule)
RewriteCond %{QUERY_STRING} ^vn/ [OR,NC]
RewriteCond %{REQUEST_URI} /webhook\.php$ [OR,NC]
RewriteCond %{REQUEST_URI} /football\.php$ [NC]
RewriteRule ^ - [G]

# Canonical redirect
:

NB: G (gone) is just shorthand for R=410.

As a general rule, the order of your directives should be:

  1. Blocking directives

  2. External redirects

  3. Internal rewrites

Wrong, today, the spammy urls have a 301 code, and then a 410 code

Although this doesn't really matter, except that it potentially uses a minuscule amount of additional resources. It's still ultimately a 410.

  • Related