I found a python package on GitHub that doesn't work. It attempts to replace a substring within a url with another string.
string = "filename.txt"
rewrite = "c:\\windows\\system32\\drivers\\hosts"
url = "https://www.example.com/path?parameter=filename.txt"
fullrewrite = re.sub(string, rewrite, url)
The string
, rewrite
, and url
parameters are arbitrary and not hard-coded. I just put them there as an example (this is a path traversal testing library I'm trying to play around with).
When I run this code, I get a KeyError from re
, which is expected according to the docs:
If you’re not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string. However, if Python would recognize the resulting sequence, the backslash should be repeated twice. This is complicated and hard to understand, so it’s highly recommended that you use raw strings for all but the simplest expressions.
I tried using repr
to convert the string into a raw string:
raw = repr(rewrite)[1:-1] # [1:-1] removes extra quotes.
fullrewrite = re.sub(string, raw, url)
But this creates double backslashes in the resulting url: https://www.example.com/path?parameter=c:\\windows\\system32\\drivers\\hosts
My question is how am I supposed to have it replace the key word so that the resulting string is: https://www.example.com/path?parameter=c:\windows\system32\drivers\hosts
?
CodePudding user response:
This is my understanding, please correct me if i'm wrong.
You don't get double backslashes, but escaped backslashes. In Re and Python, one backslash is a special character. It does not match the backslash character.(or rather, not always) To print one backslash, one would need to escape it with another.(again - most often) Thus, one can say that a double backslash is an internal representation of a backslash.
If one puts 'c:\\' into print()
or save it to a 'txt' file, one will get 'c:\'.
P.S. Since '\q' is not a special sequence in Python, '\q'=='\\q'
returns True
.