I am trying to extract a list from a script tag in an html file. How do I extract the list called markers from the script tag ?
from bs4 import BeautifulSoup
import requests
import re
import json
soup = BeautifulSoup(requests.get('url').content, 'html.parser')
scripts = soup.find_all('script')
txt = scripts[22]
print(txt)
The returned data ( value of txt ) is in the following format
<script>jQuery.extend(Drupal.settings, {"basePath":"\/","pathPrefix":"en\/","setHasJsCookie":0,"ajaxPageState, "markers":[{"latitude":"49.123","longitude":"-123.000","title":"point of interest"}] <script>
CodePudding user response:
Using regex is probably your best bet
import re
import json
txt = '<script>jQuery.extend(Drupal.settings, {"basePath":"\/","pathPrefix":"en\/","setHasJsCookie":0,"ajaxPageState, "markers":[{"latitude":"49.123","longitude":"-123.000","title":"point of interest"}] <script>'
pattern = re.findall(r'"markers":(\[.*?\])\s<script>', txt)
lst = json.loads(pattern[0])
print(lst)
Output:
[{'latitude': '49.123', 'longitude': '-123.000', 'title': 'point of interest'}]