I have an HTML file, and this file contains several scripts specifically in the last script contains a value that I would like to get
I need to get the hash value found here
extend(cur, { "hash": "13334a0e457f0793ec", "loginHost": "login", "sureBoxText": false, "strongCode": 0, "joinParams": false, "validationType": 3, "resendDelay": 120, "calledPhoneLen": 4, "calledPhoneExcludeCountries": [1, 49, 200] });
for this i used
import re
with open("test.html", "r", encoding='utf-8') as f:
html = f.read()
hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
works perfectly, but when I try to do the same using directly from the request the error.
with requests.get(url, headers=headers, cookies=cookies) as response:
if response.status_code == 200:
html = response.content
hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
return hash
ERROR
TypeError: cannot use a string pattern on a bytes-like object
then I performed a simple test where I saved the 'response.text' in an html file and tried to read as the first way, the error remained soon after I entered the file and in my vscode I clicked to format the file, it fixed the entire html file, I performed the test and it worked. i need a way where it does the formatting from 'response.text' to html so i can get my value, or if there's another way i don't know i'm willing to learn.
OBS the hash value is found in 'response.text'
CodePudding user response:
You need to decode the bytes to string:
re.search(r'{ "hash": "(.*?)",', html.decode('utf-8'))
CodePudding user response:
Try converting html
to a string using str()
:
hash = re.search(r'{ "hash": "(.*?)",', str(html)).group(1)
Edit: your regular expression isn't correct, change it to:
hash = re.search(r'"hash":"(.*?)",', str(html)).group(1)
CodePudding user response:
I believe you are looking for response.text
which is "Content of the response, in unicode.". See https://2.python-requests.org/en/master/api/#requests.Response.text
with requests.get(url, headers=headers, cookies=cookies) as response:
if response.status_code == 200:
html = response.text
hash = re.search(r'{ "hash": "(.*?)",', html).group(1)
return hash