Home > Mobile >  JSON Decode Error when trying to convert string
JSON Decode Error when trying to convert string

Time:09-23

I want to scrape the data from a script tag in json format as seen below with BeautifulSoup but I encounter an Expecting value: line 1 column 1 (char 0) error which implies that the variable is empty. What am I missing here?

#PYTHON:
    a = soup.find("script", type="application/ld json")
    a = str(a)
    print (a)
    
    data = dict()
    script_dict = json.loads(a.replace("'",'"'))
    print (script_dict)
    data["author"] = script_dict["author"] 
    data["embed_url"] = script_dict["embedUrl"]
    data["duration"] = ":".join(re.findall(r"\d\d",script_dict["duration"]))
    data["upload_date"] = re.findall(r"\d{4}-\d{2}-\d{2}",script_dict["uploadDate"])[0]
    data["accurate_views"] = int(script_dict["interactionStatistic"][0]["userInteractionCount"].replace(",",""))

Data to be scraped:

  <script type="application/ld json">
            {
                "@context": "http://schema.org/",
                "@type": "DATA",
                "name": "Klaus ;",
                "embedUrl": "http://example.com",
                "duration": "PT00H11M27S",
                
                "uploadDate": "2022-07-30T13:12:05 00:00",
                "description": "SOMETEXT;",
                 "author" : "Klaus",        "interactionStatistic": [
                {
                      "@type": "InteractionCounter",
                      "interactionType": "http://schema.org/WatchAction",
                      "userInteractionCount": "4,924,277"
                },
                {
                      "@type": "InteractionCounter",
                      "interactionType": "http://schema.org/LikeAction",
                      "userInteractionCount": "10,469"
                 }
                ]
            }
        </script>

CodePudding user response:

Don't convert the tag to string with str(). Use .text property and then json.loads:

import json
from bs4 import BeautifulSoup


s = """\
  <script type="application/ld json">
            {
                "@context": "http://schema.org/",
                "@type": "DATA",
                "name": "Klaus ;",
                "embedUrl": "http://example.com",
                "duration": "PT00H11M27S",
                
                "uploadDate": "2022-07-30T13:12:05 00:00",
                "description": "SOMETEXT;",
                 "author" : "Klaus",        "interactionStatistic": [
                {
                      "@type": "InteractionCounter",
                      "interactionType": "http://schema.org/WatchAction",
                      "userInteractionCount": "4,924,277"
                },
                {
                      "@type": "InteractionCounter",
                      "interactionType": "http://schema.org/LikeAction",
                      "userInteractionCount": "10,469"
                 }
                ]
            }
        </script>"""


soup = BeautifulSoup(s, "html.parser")

data = soup.find("script", type="application/ld json")
data = json.loads(data.text)

print(data)

Prints:

{
    "@context": "http://schema.org/",
    "@type": "DATA",
    "name": "Klaus ;",
    "embedUrl": "http://example.com",
    "duration": "PT00H11M27S",
    "uploadDate": "2022-07-30T13:12:05 00:00",
    "description": "SOMETEXT;",
    "author": "Klaus",
    "interactionStatistic": [
        {
            "@type": "InteractionCounter",
            "interactionType": "http://schema.org/WatchAction",
            "userInteractionCount": "4,924,277",
        },
        {
            "@type": "InteractionCounter",
            "interactionType": "http://schema.org/LikeAction",
            "userInteractionCount": "10,469",
        },
    ],
}
  • Related