Home > Net >  How to decode this string that Beautifulsoup process it?
How to decode this string that Beautifulsoup process it?

Time:12-23

I have an HTML document that has javascript in it, using re.findall I was able to get the arguments of the function I would need to convert them to a Beautifulsoup object.

The problem is that BS can not recognize the encoding of the string so I have a result that looks like this:

\x3cdiv class\x3d\x22table\x22\x3e MY DATA \x3c/div\x3e

I have tried different solutions like decode, etc. but still no solution.

EDIT : when I manually pass the string as str = r"\x3cdiv class\x3d\x22table\x22\x3e MY DATA \x3c/div\x3e" Beautifulsoup is able to decode it, but once extracted from regex the string remains coded.

CodePudding user response:

You need to escape your string when pasting it. You can parse it similar to this

In js

    const res = `\\x3cdiv class\\x3d\\x22table\\x22\\x3e MY DATA \\x3c/div\\x3e`
    .split('\\x')
    .slice(1)
    .map(v => {
      return String.fromCharCode(parseInt(v.slice(0, 2), 16))   v.slice(2)
    }).join('')


    console.log(res)

In python

def map_func(v):
    return chr((int(v[0:2], 16)))   v[2:]


txt = "\\x3cdiv class\\x3d\\x22table\\x22\\x3e MY DATA \\x3c/div\\x3e"
arr = txt.split('\\x')
arr = arr[1:]
print(''.join(map(map_func, arr))
  • Related