How can I extract value of variable from script element in Scrapy-CodePudding

I need to extract some data from a website, I found that all I need is exist in <script> element, So I extracted them with this command:

script = response.css('[id="server-side-container"] script::text').get()

And this is the value of script:

    window.bottomBlockAreaHtml = '';
    ...
    window.searchQuery = '';
    window.searchResult = {
  "stats": {...},
  "products": {...},
  ...
  };
    window.routedFilter = '';
  ...
    window.searchContent = '';

What is the best way to get the value of "products" in my python code?

CodePudding user response：

In your example the best strategy would be to use regex to extract the value of the window.searchResults using regex. Then convert it to a dictionary using json.loads(), and then getting the value from the "products" key of the dictionary.

For example.

import json
import scrapy
import re

class LoplabbetSpider(scrapy.Spider):

    name = "loplabbet"
    start_urls = ["https://www.loplabbet.se/lopning/"]
    pattern = re.compile(r'window\.searchResult = (\{.*?\});', flags=re.DOTALL)

    def parse(self, response):
        for script in response.css("script").getall():
            matches = self.pattern.findall(script)
            if matches:
                results = json.loads(matches[0])
                product = results["products"]
                yield product

CodePudding user response：

If you have a string which looks like a json text,

{ 
  "product":"hello", 
  "somethingElse":"else" 
}

something like this, then what you can do is take your response variable and just apply,

import json
data = json.loads(response)
print(data['product'])

and this should do the trick.