I have the following rewrite rules, which I honestly don't quite understand. This great community helped me to write them a long time ago:
RewriteEngine On
#1
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #FB([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/facebook/%1/%2? [L,NE,R=302]
#2
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #LI([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/linkedin/%1/%2? [L,NE,R=302]
#3
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #PT([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/pinterest/%1/%2? [L,NE,R=302]
#4
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.] )\.[^.#] #XN([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/xing/%1/%2? [L,NE,R=302]
#5
RewriteCond %{QUERY_STRING}#%{HTTP_HOST} ^IG([^&#] )#(?:. \.)?([^.] )\. [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%2/%1 [L,NE,R=302]
#6
RewriteCond %{HTTP_HOST} \.([^.] )\.[^.] $
RewriteRule ^IG([^/] )/?$ https://www.mysite.de/link/instagram/%1/$1 [L,NC,NE,R=302]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [L,QSA,R=302]
Besides lots of other things, I don't get why in rule #5 %{QUERY_STRING}
and %{HTTP_HOST}
is swapped compared to the other rules. What does the #
do within the rewrite condition? Does it function as a separator?
What does rule #6 do in addition to rule #5?
The main problem for now: The following link:
https://test.info/IGZSGUFSPSWC?fbclid=PAAaY2XV-SntbX2OtypPe92gVjSldUctufiEup5zBCOCC7rB71MO8JQlc-8F0&e=ATPMSEL2pg5PoJpf1puuGgcr8dPHj-CUM60wGlrTHZ4VGbz5KBIno8SeX_UzO-K1HHjnP8ebBEwdDfWgMGB3Pa1mv6YCLAUzSBvdZQ
should be redirected to
https://www.mysite.de/link/instagram/datenschutz-impressum/ZSGUFSPSWC
but it is redirected to
https://www.mysite.de/link/facebook/datenschutz-impressum/clid=PAAaY2XV-SntbX2OtypPe92gVjSldUctufiEup5zBCOCC7rB71MO8JQlc-8F0
The query string begins with fb
so the first rule does apply. I want rule #5 to apply.
How can this be done?
I wish you a great new year
CodePudding user response:
#5 RewriteCond %{QUERY_STRING}#%{HTTP_HOST} ^IG([^&#] )#(?:. \.)?([^.] )\. [NC] RewriteRule ^ https://www.mysite.de/link/instagram/%2/%1 [L,NE,R=302]
I don't get why in rule #5
%{QUERY_STRING}
and%{HTTP_HOST}
is swapped compared to the other rules.
There would not seem to be any good reason for that. Maybe it was written at a different time and it just made sense to do it that way at the time?
Everything is just reversed... the regex and the backreferences in the substitution string (ie. %2
and %1
).
That rule could be rewritten the same way as the preceding rules like this:
#5 (reversed)
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} ^(?:. \.)?([^.] )\.[^#]*#IG([^&] )$ [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%1/%2 [L,NE,R=302]
Note that this rule is subtly different to the preceding rules for some reason, which does look like an error, but may not make any difference (depending on the request). Points of note:
This rule makes the subdomain optional (for single level TLDs), whereas in the preceding rules the subdomain is mandatory. For instance, this rule will match
test.info
, whereas the preceding rules will not, since they are expecting a subdomain likewww.test.info
. So I doubt thattest.info
is an accurate "exemplified" hostname in your example?This rule only permits a single URL parameter that starts
IG
followed by something. Whereas the earlier rules match the two character code followed by anything (incl. nothing) and any other URL parameters (which are simply discarded).This rule also preserves the query string, whereas the preceding rules discard it. This looks like an oversight/error?
What does the
#
do within the rewrite condition? Does it function as a separator?
Yes, it's simply a separator between the two parts of the URL. Which effectively allows two conditions to be combined in order to capture backreferences from both. Any character could be used here that does not occur in either the hostname or the query string parts of the URL.
What does rule #6 do in addition to rule #5?
Rule #6 checks the URL-path instead of the query string (in rule #5). In fact, it's this rule that should be triggered for your example URL, not rule #5.
In other words, rule #5 would match /anything?IGX
, whereas rule #6 matches /IGX
.
Solution
The query string begins with
fb
so the first rule does apply. I want rule #5 to apply.
(Except, as mentioned above, test.info
would not match, and rule #6 should apply here, not rule #5.)
I would question whether you need the NC
(case-insensitive) flag on all the preceding conditions? Do you need to match fb
in the query string in the first rule, or just FB
as stated in the regex? Removing the NC
would resolve the immediate problem you are experiencing.
Otherwise, you need to change the order of the rules so that rule #6 is first and therefore takes priority over rule #1.
Aside:
RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.*)$ / [L,QSA,R=302]
This 302 redirects anything that would ordinarily trigger a 404 to root (ie. the "homepage"). This isn't generally recommended for SEO or users. However, this is more easily achieved with the following core directive, which will do the same thing:
ErrorDocument 404 https://www.example.com/
Logically, this should be defined at the top of the file, but the order does not really matter.