I have the following rewrite rule in .htaccess :-
RewriteRule ^.*/-y.* /handleurl.php [L]
Its purpose is to display appropriate pages depending on the values in the url, for example:
example.com/books/BookA/-y?act=x
will display bookA page
the variable holding the book name is encoded such that ...
example.com/books/Book B/-y?act=x
becomes example.com/books/book B/-y?act=x
... which is fine (it's decoded in handleurl.php
)
however if the book is called Book A/B
I have ...
example.com/books/Book A/B/-y?act=x
which becomes example.com/books/Book A/B/-y?act=x
It appears that htaccess decodes this before the rewrite rule, so the rewrite rule sees too many elements in the URL delineated by the /
.
Is there any way I can get the rewrite rule to ignore the encoded /
as intended?
I have seen a previous response to a similar question, but I only need the /
to be ignored, not other encoded characters.
CodePudding user response:
It appears that htaccess decodes this before the rewrite rule, so the rewrite rule sees too many elements in the URL delineated by the
/
This is not the problem. Regardless of whether the URL-path /books/Book A/B/-y
is decoded or not makes no difference here*1. Both would match the (rather generous) regex ^.*/-y.*
in the RewriteRule
pattern.
(*1 But yes, the URL-path matched by the RewriteRule
pattern is URL decoded, ie. %-decoded.)
The problem is likely to be that Apache (by default) rejects - with a 404 - any URL that contains a %-encoded slash ie. /
(or backslash \
) in the URL-path portion of the URL. This is a security feature, that otherwise "could potentially allow unsafe paths" (source).
However, this can be overridden with the AllowEncodedSlashes
directive. But this directive can only be used in a server or virtualhost context. It cannot be used in .htaccess
.
You either need to set AllowEncodedSlashes On
to allow encoded slashes, which are also decoded, as with other characters. Or set AllowEncodedSlashes NoDecode
to permit encoded slashes, but do not decode them - which is preferred and probably what you are expecting.
Aside#1:
RewriteRule ^.*/-y.* /handleurl.php [L]
The regex ^.*/-y.*
is very generic, possibly too generic. This is the same as simply /-y
. What is the .*
after -y
intended to match? From your example URLs it looks like -y
is always at the end of the URL-path, so this could be anchored, eg. /-y$
. And if the URL that you need to match always starts /books/
then maybe this should also be included in the regex?
Aside#2:
...the book name is encoded such that ...
example.com/books/Book B/-y?act=x
becomesexample.com/books/book B/-y?act=x
... which is fine (it's decoded in handleurl.php)
This isn't strictly "URL encoded", you have converted the space into a
in the URL-path. The
is a valid "URL encoding" for a space when used in the query string only. A
in the URL-path is a literal
(and will be seen by search engines as such). In the URL-path, a space would be URL encoded as
. (You may have used the wrong PHP encoding functions, eg. urlencode()
instead of rawurlencode()
?)
Of course, you are free to convert/encode the URL however you wish to create a more readable URL - providing it's valid.
CodePudding user response:
The rewrite rule was never the problem. I have to decide whether I want to allow '/' in the variables that make up the elements of the freindly url or not (might make them less freindly to search engines?). In any case they should not be encoded. If yes, I have to deal with them in the receiving program. Maybe I will convert '/' to '|' for the benefit of the URL then convert them back prior to subsequent display. Thank you Mr White.