how to get a script value in html in 'response.text' using re.search?-CodePudding

I have an HTML file, and this file contains several scripts specifically in the last script contains a value that I would like to get

I need to get the hash value found here

extend(cur, { "hash": "13334a0e457f0793ec", "loginHost": "login", "sureBoxText": false, "strongCode": 0, "joinParams": false, "validationType": 3, "resendDelay": 120, "calledPhoneLen": 4, "calledPhoneExcludeCountries": [1, 49, 200] });

for this i used

import re

with open("test.html", "r", encoding='utf-8') as f:
    html = f.read()

hash = re.search(r'{ "hash": "(.*?)",', html).group(1)

works perfectly, but when I try to do the same using directly from the request the error.

with requests.get(url, headers=headers, cookies=cookies) as response:
        if response.status_code == 200:
            html = response.content
            hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
            return hash

ERROR

TypeError: cannot use a string pattern on a bytes-like object

then I performed a simple test where I saved the 'response.text' in an html file and tried to read as the first way, the error remained soon after I entered the file and in my vscode I clicked to format the file, it fixed the entire html file, I performed the test and it worked. i need a way where it does the formatting from 'response.text' to html so i can get my value, or if there's another way i don't know i'm willing to learn.

OBS the hash value is found in 'response.text'

CodePudding user response：

You need to decode the bytes to string:

re.search(r'{ "hash": "(.*?)",', html.decode('utf-8'))

CodePudding user response：

Try converting html to a string using str():

hash = re.search(r'{ "hash": "(.*?)",', str(html)).group(1)

Edit: your regular expression isn't correct, change it to:

hash = re.search(r'"hash":"(.*?)",', str(html)).group(1)

CodePudding user response：

I believe you are looking for response.text which is "Content of the response, in unicode.". See https://2.python-requests.org/en/master/api/#requests.Response.text

with requests.get(url, headers=headers, cookies=cookies) as response:
        if response.status_code == 200:
            html = response.text
            hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
            return hash