Home > Back-end >  htaccess/mod_write: Help to understand rewrite rules and rewrite conditions
htaccess/mod_write: Help to understand rewrite rules and rewrite conditions

Time:01-10

I have the following rewrite rules, which I honestly don't quite understand. This great community helped me to write them a long time ago:

RewriteEngine On
#1
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #FB([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/facebook/%1/%2? [L,NE,R=302]
#2
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #LI([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/linkedin/%1/%2? [L,NE,R=302]
#3
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #PT([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/pinterest/%1/%2? [L,NE,R=302]
#4
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #XN([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/xing/%1/%2? [L,NE,R=302]
#5
RewriteCond %{QUERY_STRING}#%{HTTP_HOST} ^IG([^&#] )#(?:. \.)?([^.] )\. [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%2/%1 [L,NE,R=302]
#6
RewriteCond %{HTTP_HOST} \.([^.] )\.[^.] $
RewriteRule ^IG([^/] )/?$ https://www.mysite.de/link/instagram/%1/$1 [L,NC,NE,R=302]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [L,QSA,R=302]

Besides lots of other things, I don't get why in rule #5 %{QUERY_STRING} and %{HTTP_HOST} is swapped compared to the other rules. What does the # do within the rewrite condition? Does it function as a separator? What does rule #6 do in addition to rule #5?

The main problem for now: The following link:

https://test.info/IGZSGUFSPSWC?fbclid=PAAaY2XV-SntbX2OtypPe92gVjSldUctufiEup5zBCOCC7rB71MO8JQlc-8F0&e=ATPMSEL2pg5PoJpf1puuGgcr8dPHj-CUM60wGlrTHZ4VGbz5KBIno8SeX_UzO-K1HHjnP8ebBEwdDfWgMGB3Pa1mv6YCLAUzSBvdZQ

should be redirected to

https://www.mysite.de/link/instagram/datenschutz-impressum/ZSGUFSPSWC

but it is redirected to

https://www.mysite.de/link/facebook/datenschutz-impressum/clid=PAAaY2XV-SntbX2OtypPe92gVjSldUctufiEup5zBCOCC7rB71MO8JQlc-8F0

The query string begins with fb so the first rule does apply. I want rule #5 to apply.

How can this be done?

I wish you a great new year

CodePudding user response:

#5
RewriteCond %{QUERY_STRING}#%{HTTP_HOST} ^IG([^&#] )#(?:. \.)?([^.] )\. [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%2/%1 [L,NE,R=302]

I don't get why in rule #5 %{QUERY_STRING} and %{HTTP_HOST} is swapped compared to the other rules.

There would not seem to be any good reason for that. Maybe it was written at a different time and it just made sense to do it that way at the time?

Everything is just reversed... the regex and the backreferences in the substitution string (ie. %2 and %1).

That rule could be rewritten the same way as the preceding rules like this:

#5 (reversed)
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} ^(?:. \.)?([^.] )\.[^#]*#IG([^&] )$ [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%1/%2 [L,NE,R=302]

Note that this rule is subtly different to the preceding rules for some reason, which does look like an error, but may not make any difference (depending on the request). Points of note:

  • This rule makes the subdomain optional (for single level TLDs), whereas in the preceding rules the subdomain is mandatory. For instance, this rule will match test.info, whereas the preceding rules will not, since they are expecting a subdomain like www.test.info. So I doubt that test.info is an accurate "exemplified" hostname in your example?

  • This rule only permits a single URL parameter that starts IG followed by something. Whereas the earlier rules match the two character code followed by anything (incl. nothing) and any other URL parameters (which are simply discarded).

  • This rule also preserves the query string, whereas the preceding rules discard it. This looks like an oversight/error?

What does the # do within the rewrite condition? Does it function as a separator?

Yes, it's simply a separator between the two parts of the URL. Which effectively allows two conditions to be combined in order to capture backreferences from both. Any character could be used here that does not occur in either the hostname or the query string parts of the URL.

What does rule #6 do in addition to rule #5?

Rule #6 checks the URL-path instead of the query string (in rule #5). In fact, it's this rule that should be triggered for your example URL, not rule #5.

In other words, rule #5 would match /anything?IGX, whereas rule #6 matches /IGX.

Solution

The query string begins with fb so the first rule does apply. I want rule #5 to apply.

(Except, as mentioned above, test.info would not match, and rule #6 should apply here, not rule #5.)

I would question whether you need the NC (case-insensitive) flag on all the preceding conditions? Do you need to match fb in the query string in the first rule, or just FB as stated in the regex? Removing the NC would resolve the immediate problem you are experiencing.

Otherwise, you need to change the order of the rules so that rule #6 is first and therefore takes priority over rule #1.


Aside:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [L,QSA,R=302]

This 302 redirects anything that would ordinarily trigger a 404 to root (ie. the "homepage"). This isn't generally recommended for SEO or users. However, this is more easily achieved with the following core directive, which will do the same thing:

ErrorDocument 404 https://www.example.com/

Logically, this should be defined at the top of the file, but the order does not really matter.

  • Related