Home > OS >  Extract data from script tag [HTML] using BeautifulSoup in Python
Extract data from script tag [HTML] using BeautifulSoup in Python

Time:05-13

I want to Extract data from a variable which is inside of a script:

<script>
var Itemlist = 'null';
var ItemData = '[{\"item_id\":\"107\",\"id\":\"79\",\"line_item_no\":\"1\",\"Amount\":\"99999.00\"}]';
</script>

I want the item_id and the Amount inside of a variable in python

I tried using regex it worked for a while but when the cookies session updated it stopped working

Is there any other way to get those values??

I am using this method to get the script from the html but it changes when the cookie session updates

soup = bs(response.content, 'html.parser')
script = soup.find('script')[8]

so i have to change the number that i've put after ('script') for now it's [8] if cookies session updates i have to keep changing the number until i find the script i am looking for

CodePudding user response:

To get the data from the <script> you can use this example:

import re
import json
from bs4 import BeautifulSoup

html_data = """
<script>
var Itemlist = 'null';
var ItemData = '[{\"item_id\":\"107\",\"id\":\"79\",\"line_item_no\":\"1\",\"Amount\":\"99999.00\"}]';
</script>
"""

soup = BeautifulSoup(html_data, "html.parser")
data = soup.select_one("script").text
data = re.search(r"ItemData = '(.*)';", data).group(1)
data = json.loads(data)

print("Item_id =", data[0]["item_id"], "Amount =", data[0]["Amount"])

Prints:

Item_id = 107 Amount = 99999.00
  • Related