I have an HTML document that has javascript in it, using re.findall I was able to get the arguments of the function I would need to convert them to a Beautifulsoup object.
The problem is that BS can not recognize the encoding of the string so I have a result that looks like this:
\x3cdiv class\x3d\x22table\x22\x3e MY DATA \x3c/div\x3e
I have tried different solutions like decode, etc. but still no solution.
EDIT : when I manually pass the string as str = r"\x3cdiv class\x3d\x22table\x22\x3e MY DATA \x3c/div\x3e" Beautifulsoup is able to decode it, but once extracted from regex the string remains coded.
CodePudding user response:
You need to escape your string when pasting it. You can parse it similar to this
In js
const res = `\\x3cdiv class\\x3d\\x22table\\x22\\x3e MY DATA \\x3c/div\\x3e`
.split('\\x')
.slice(1)
.map(v => {
return String.fromCharCode(parseInt(v.slice(0, 2), 16)) v.slice(2)
}).join('')
console.log(res)
In python
def map_func(v):
return chr((int(v[0:2], 16))) v[2:]
txt = "\\x3cdiv class\\x3d\\x22table\\x22\\x3e MY DATA \\x3c/div\\x3e"
arr = txt.split('\\x')
arr = arr[1:]
print(''.join(map(map_func, arr))