Home > Software engineering >  Python API/webscraping exercise with JSON
Python API/webscraping exercise with JSON

Time:01-06

I am learning python and am working on API/webscraping. I have an exercise which seems to be giving me difficulty on the first steps. Ultimately, I am supposed to write a function that gets the number of jobs from a json list of jobs that contain specific key values in a field. But to start with, I'm just trying to pull any sort of json data.

the content of the resource should look like this

api_url = "http://127.0.0.1:5000/data"
response = requests.get(api_url)
jsonResponse = response.json()
print(jsonResponse)

produces

None

I have also tried:

session = requests.Session()
api_url = "http://127.0.0.1:5000/data"
response = session.get(api_url)
jsonResponse = response.json()
print(jsonResponse)

but that also produces

None

I can confirm that the content type is json:

h = requests.head('http://127.0.0.1:5000/data')
header = h.headers
contentType = header.get('content-type')
print(contentType)

produces

application/json

But I'm not sure what to make of the 'Content-Length' and 'Connection' attributes from:

h = requests.head('http://127.0.0.1:5000/data')
header = h.headers
print(header)

which produces:

{'Server': 'Werkzeug/2.2.2 Python/3.9.2', 'Date': 'Thu, 05 Jan 2023 17:25:29 GMT', 'Content-Type': 'application/json', 'Content-Length': '5', 'Connection': 'close'}

I've also tried limiting the results by passing the 'Id':'225' as a params in the get request but that didn't change anything. I'm sure I'm missing something obvious but I just can't seem to figure it out. What am I doing wrong?

UPDATE: After debugging the "hosting" notebook, I was able to get it to run error free. During this debugging, I noticed the relevant sections which should help in identifying what the resource is expecting.

@app.route('/data', methods=['GET'])
def api_id():
    # Check if keys such as Job Title,KeySkills, Role Category and others  are provided as part of the URL.
    #  Assign the keys to the corresponding variables..
    # If no key is provided, display an error in the browser.
    res = None
    for req in request.args:
        
        if req == 'Job Title':
            key = 'Job Title'
        elif req == 'Job Experience Required' :
            key='Job Experience Required'
        elif req == 'Key Skills' :
            key='Key Skills'
            
        elif req == 'Role Category' :
            key='Role Category'
        elif req == 'Location' :
            key='Location'
        
        elif req == 'Functional Area' :
            key='Functional Area'
        
        elif req == 'Industry' :
            key='Industry'
        elif req == 'Role' :
            key='Role'
        elif req=="id":
             key="id"
        else:
            pass
    
        value = request.args[key]
        if (res==None):
            res = get_data(key,value,data)
        else:
            res = get_data(key,value,res)

    # Use the jsonify function from Flask to convert our list of
    # Python dictionaries to the JSON format.
    return jsonify(res)

I noticed that it is providing a 'None' response if it does not receive the proper arguements. So I tried:

api_url = "http://127.0.0.1:5000/data"
params = {'id':'225'}
r = requests.post(api_url,data=params)
jsonResponse = r.json()
print(jsonResponse)

This produces a JSONDecodeError:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Input In [20], in <cell line: 4>()
      2 params = {'id':'225'}
      3 r = requests.post(api_url,data=params)
----> 4 jsonResponse = r.json()
      5 print(jsonResponse)

File /usr/lib/python3/dist-packages/requests/models.py:900, in Response.json(self, **kwargs)
    894         except UnicodeDecodeError:
    895             # Wrong UTF codec detected; usually because it's not UTF-8
    896             # but some other 8-bit codec.  This is an RFC violation,
    897             # and the server didn't bother to tell us what codec *was*
    898             # used.
    899             pass
--> 900 return complexjson.loads(self.text, **kwargs)

File /usr/lib/python3.9/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File /usr/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File /usr/lib/python3.9/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I also tried to retrieve the text but it seems that's not allowed. I have verified it is using utf8 encoding but it seems I'm still missing something.

CodePudding user response:

The syntax that was being looked for here was something like:

api_url = "http://127.0.0.1:5000/data"
payload = {'Location':'New York'}
r = requests.get(api_url, params=payload)

I think there were multiple things causing issues, many, if not all, related to how this course is having us "host" the data instead of providing online and consistent/standard resources. Thanks IBM/Coursera!

CodePudding user response:

Maybe because of your response takes too much time to get entire content. Try to read the response as a chunk.

import requests
url = "http://127.0.0.1:5000/data"

with requests.get(url, stream=True) as response:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            print(chunk) 

CodePudding user response:

This will do the job hopefully

import requests
def get_json():
    result= requests.get("http://127.0.0.1:5000/data")
    print(result.content.decode('utf-8'))
    # use ascii if utf-8 don't give you the result
    print(result.content.decode('ascii'))
get_json()

   
  • Related