Home > database >  How can I extract value of variable from script element in Scrapy
How can I extract value of variable from script element in Scrapy

Time:01-07

I need to extract some data from a website, I found that all I need is exist in <script> element, So I extracted them with this command:

script = response.css('[id="server-side-container"] script::text').get()

And this is the value of script:

    window.bottomBlockAreaHtml = '';
    ...
    window.searchQuery = '';
    window.searchResult = {
  "stats": {...},
  "products": {...},
  ...
  };
    window.routedFilter = '';
  ...
    window.searchContent = '';

What is the best way to get the value of "products" in my python code?

CodePudding user response:

In your example the best strategy would be to use regex to extract the value of the window.searchResults using regex. Then convert it to a dictionary using json.loads(), and then getting the value from the "products" key of the dictionary.

For example.

import json
import scrapy
import re

class LoplabbetSpider(scrapy.Spider):

    name = "loplabbet"
    start_urls = ["https://www.loplabbet.se/lopning/"]
    pattern = re.compile(r'window\.searchResult = (\{.*?\});', flags=re.DOTALL)

    def parse(self, response):
        for script in response.css("script").getall():
            matches = self.pattern.findall(script)
            if matches:
                results = json.loads(matches[0])
                product = results["products"]
                yield product

CodePudding user response:

If you have a string which looks like a json text,

{ 
  "product":"hello", 
  "somethingElse":"else" 
}

something like this, then what you can do is take your response variable and just apply,

import json
data = json.loads(response)
print(data['product'])

and this should do the trick.

  • Related