I need to get part of a json response
Part of my code:
r = scraper.get('https://nsa.gob.ye/ha/api/scar-doc/01/09090909/', json=payload, headers=headers, cookies=cookies)
Part of the Response print(r.text):
<div style="clear: both" aria-label="request info">
<pre ><b>GET</b> /ha/api/scar-doc/01/09090909/</pre>
</div>
<div aria-label="response info">
<pre ><span ><b>HTTP 200 OK</b>
<b>Allow:</b> <span >GET, HEAD, OPTIONS</span>
<b>Content-Type:</b> <span >application/json</span>
<b>Vary:</b> <span >Accept</span>
</span>{
'datos': {
'data': {
'tipo_documento': '01',
'numero_documento': '09090909',
'apellido_paterno': 'SHREK',
'apellido_materno': 'SHREK',
'nombres': 'SHREK',
'edad_anios': 111,
'str_fecha_nacimiento': '00/00/0000'
},
'resultado': 'Enc'
}
}</pre>
</div>
</div>
I need to get 'str_fecha_nacimiento' content using beautifulsoup. Thanks
CodePudding user response:
The problem I saw was the JSON is in plain text inside an incomplete HTML code.
So, I try by splitting the code inside the div
element and then, get only the JSON data - by discarding the first lines.
Here is the code:
sample_data = """
<div style="clear: both" aria-label="request info">
<pre ><b>GET</b> /ha/api/scar-doc/01/09090909/</pre>
</div>
<div aria-label="response info">
<pre ><span ><b>HTTP 200 OK</b>
<b>Allow:</b> <span >GET, HEAD, OPTIONS</span>
<b>Content-Type:</b> <span >application/json</span>
<b>Vary:</b> <span >Accept</span>
</span>{
'datos': {
'data': {
'tipo_documento': '01',
'numero_documento': '09090909',
'apellido_paterno': 'SHREK',
'apellido_materno': 'SHREK',
'nombres': 'SHREK',
'edad_anios': 111,
'str_fecha_nacimiento': '00/00/0000'
},
'resultado': 'Enc'
}
}</pre>
</div>
</div>
"""
# Get the soup:
soup = BeautifulSoup(sample_data, "html.parser")
# Get only the JSON data - that is, by discarding the elements before the 6th line
# The data here is split by the line-break "\n" and then joined again in a single string:
js_data = "\n".join(soup.find("div", class_="response-info").get_text().split("\n")[6:])
# Print the JSON data obtained:
print(js_data)
Result:
{
'datos': {
'data': {
'tipo_documento': '01',
'numero_documento': '09090909',
'apellido_paterno': 'SHREK',
'apellido_materno': 'SHREK',
'nombres': 'SHREK',
'edad_anios': 111,
'str_fecha_nacimiento': '00/00/0000'
},
'resultado': 'Enc'
}
}
Notice that, after applying the code shown in this answer, you can get the actual JSON data:
Code:
import ast
json_data = ast.literal_eval(json.dumps(js_data))
print(json_data)