Home > Back-end >  Get specific information out of application/ld json
Get specific information out of application/ld json

Time:07-19

I am looking to get just the "ratingValue" and "reviewCount" from the following application/ld json but cannot figure out how to do this after looking through numerous How-to's, so I've essentially given up. In advance thank you for your help.

Sample of the application/ld json

{
   "@context": "http://schema.org",
   "@graph": [
      {
         "@type": "Product",
         "name": "MERV 8 Replacement for Trion Air Bear 20x20x5 (19.63x20.13x4.88) ",
         "description": "MERV 8 Replacement for Trion Air Bear 20x20x5 (19.63x20.13x4.88)  - FilterBuy.com",
         "productID": 30100,
         "sku": "ABR20x20x5M8",
         "mpn": "ABR20x20x5M8",
         "url": "https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20/merv-8/",
         "itemCondition": "new",
         "brand": "FilterBuy",
         "image": "https://filterbuy.com/media/pla_images/20x25x5AB/20x25x5AB-m8-(x1).jpg",
         "aggregateRating": {
            "@type": "AggregateRating",
            **"ratingValue": 4.79926,
            "reviewCount": 538**}

My Code:

from bs4 import BeautifulSoup
import bs4
import requests
import json
import re
import numpy as np
import csv

urls = ['https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20',
        'https://filterbuy.com/brand/trion-air-bear-air-filters/16x25x5-air-bear-1400/?selected_merv=11']
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "lxml")

mervs = BeautifulSoup(response.text, 'lxml').find_all('strong')
product = BeautifulSoup(response.text, 'lxml').find("h1", class_="text-center")
jsonString = soup.find_all('script', type='application/ld json')[1].text
json_schema = soup.find_all('script', attrs={'type': 'application/ld json'})[1]
json_file = json.loads(json_schema.get_text())

for i, cart in enumerate(BeautifulSoup(response.text, 'lxml').find_all('form', class_='cart')):
    for tax in cart.attrs:
        if 'data-price' in tax:
            print(product.text, mervs[i].get_text(), [tax], cart[tax], json_file)

CodePudding user response:

In python, pretty much anything can be nested inside other things. This is an example of nesting lists and dictionaries inside a dictionary. You can go about getting the value by thinking about what you need to do on each level.

Start by assigning the above dictionary to a variable, like the_dict. You want to access the "@graph" key, then access the first item in the list it returns, then access "aggregateRating". From there, you can get both the values you want. Your code may look something like this:

the_dict = ...
d = the_dict['@graph'][0][aggregateRating']
rating_value, review_count = d['ratingValue'], d['reviewCount']

  • Related