How to decode this string that Beautifulsoup process it?-CodePudding

I have an HTML document that has javascript in it, using re.findall I was able to get the arguments of the function I would need to convert them to a Beautifulsoup object.

The problem is that BS can not recognize the encoding of the string so I have a result that looks like this:

\x3cdiv class\x3d\x22table\x22\x3e MY DATA \x3c/div\x3e

I have tried different solutions like decode, etc. but still no solution.

EDIT : when I manually pass the string as str = r"\x3cdiv class\x3d\x22table\x22\x3e MY DATA \x3c/div\x3e" Beautifulsoup is able to decode it, but once extracted from regex the string remains coded.

CodePudding user response：

You need to escape your string when pasting it. You can parse it similar to this

In js

    const res = `\\x3cdiv class\\x3d\\x22table\\x22\\x3e MY DATA \\x3c/div\\x3e`
    .split('\\x')
    .slice(1)
    .map(v => {
      return String.fromCharCode(parseInt(v.slice(0, 2), 16))   v.slice(2)
    }).join('')


    console.log(res)

In python

def map_func(v):
    return chr((int(v[0:2], 16)))   v[2:]


txt = "\\x3cdiv class\\x3d\\x22table\\x22\\x3e MY DATA \\x3c/div\\x3e"
arr = txt.split('\\x')
arr = arr[1:]
print(''.join(map(map_func, arr))