I need to extract some data from a website, I found that all I need is exist in <script>
element, So I extracted them with this command:
script = response.css('[id="server-side-container"] script::text').get()
And this is the value of script
:
window.bottomBlockAreaHtml = '';
...
window.searchQuery = '';
window.searchResult = {
"stats": {...},
"products": {...},
...
};
window.routedFilter = '';
...
window.searchContent = '';
What is the best way to get the value of "products"
in my python code?
CodePudding user response:
In your example the best strategy would be to use regex to extract the value of the window.searchResults
using regex. Then convert it to a dictionary using json.loads()
, and then getting the value from the "products"
key of the dictionary.
For example.
import json
import scrapy
import re
class LoplabbetSpider(scrapy.Spider):
name = "loplabbet"
start_urls = ["https://www.loplabbet.se/lopning/"]
pattern = re.compile(r'window\.searchResult = (\{.*?\});', flags=re.DOTALL)
def parse(self, response):
for script in response.css("script").getall():
matches = self.pattern.findall(script)
if matches:
results = json.loads(matches[0])
product = results["products"]
yield product
CodePudding user response:
If you have a string which looks like a json text,
{
"product":"hello",
"somethingElse":"else"
}
something like this, then what you can do is take your response
variable and just apply,
import json
data = json.loads(response)
print(data['product'])
and this should do the trick.