Home > Mobile >  How to extract an array with Beautiful Soup
How to extract an array with Beautiful Soup

Time:10-09

I am trying to extract a list from a script tag in an html file. How do I extract the list called markers from the script tag ?

from bs4 import BeautifulSoup
import requests 
import re
import json


soup = BeautifulSoup(requests.get('url').content, 'html.parser')

scripts = soup.find_all('script')
txt = scripts[22]
print(txt)

The returned data ( value of txt ) is in the following format

<script>jQuery.extend(Drupal.settings, {"basePath":"\/","pathPrefix":"en\/","setHasJsCookie":0,"ajaxPageState, "markers":[{"latitude":"49.123","longitude":"-123.000","title":"point of interest"}] <script>

CodePudding user response:

Using regex is probably your best bet

import re
import json

txt = '<script>jQuery.extend(Drupal.settings, {"basePath":"\/","pathPrefix":"en\/","setHasJsCookie":0,"ajaxPageState, "markers":[{"latitude":"49.123","longitude":"-123.000","title":"point of interest"}] <script>'

pattern = re.findall(r'"markers":(\[.*?\])\s<script>', txt)
lst = json.loads(pattern[0])
print(lst)

Output:

[{'latitude': '49.123', 'longitude': '-123.000', 'title': 'point of interest'}]
  • Related