Long time user of python requests here. Trying to do a simple call to this endpoint:
https://www.overstock.com/api/product.json?prod_id=10897789
My current code:
import requests
headers = { 'User-Agent': 'Mozilla/5.0', 'Accept': 'application/json' }
url = 'https://www.overstock.com/api/product.json?prod_id=10897789'
r = requests.get( url, headers=headers )
result = r.json()
print( result )
Expected outcome (shortened):
{'categoryId': 244, 'subCategoryId': 31446, 'altSubCategoryId': 0, 'taxonomy': {'store': {'id': 1, 'name': 'Rugs', 'apiUrl': 'https://www.overstock.com/api/search.json?taxonomy=sto1', 'htmlUrl': 'https://www.overstock.com/Home-Garden/1/store.html'}, 'department': {'id': 3, 'name': 'Casual Rugs'...
Unfortunately, from that same script on Linux, I am not getting the identical result. So far I am stumped as to why this is happening...
Here is the ugly Linux error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/.local/share/virtualenvs/online-project-7j1lNF7P/lib/python3.6/site-packages/requests/models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
What could possibly be the issue? Here's what else I tried...
Linux is NOT running python 3.6 but instead running 2.7x to execute requests.Adding 'Accept': 'application/json' to headers will surely solve thisdecode the data variable first data = response.decode()(link to SO post) Fail: "AttributeError: 'Response' object has no attribute 'decode'"Use requests.Response.json(link to SO post) Fail: Gives same error as above.Upgrading to python 3.9.9 may solve it.Nope! This still fails for me.Perhaps it's your firewall.Nope, checkedufw
and it'sStatus: inactive
#5 Error (on a new Linux machine, upgraded python to 3.9.9):
`$ python3 test.py
Traceback (most recent call last):
File "/home/user/test.py", line 13, in <module>
print(r.json())
File "/usr/lib/python3/dist-packages/requests/models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
@balmy - Here is the output I'm getting after confirming requests version 2.26.0 AND python 3.9...
$ python3 test3.py
Traceback (most recent call last):
File "/home/user/test_scripts/test3.py", line 13, in <module>
print(r.json())
File "/home/eric/.local/lib/python3.9/site-packages/requests/models.py", line 910, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)
@JCaesar - here is the text (shortened to the part I think is relevant which appears bot detection is in play perhaps)
<div id="bd">
<div >
There was an error processing your request.
</div>
<span ></span>
</div>
@Philippe - here is the result in response to your comment 'Can you change the print statement to print(r.text)
and run python3 test3.py | jq .
'...
$ sudo python3 test3.py | jq .
Traceback (most recent call last):
File "/usr/lib/command-not-found", line 28, in <module>
from CommandNotFound import CommandNotFound
File "/usr/lib/python3/dist-packages/CommandNotFound/CommandNotFound.py", line 19, in <module>
from CommandNotFound.db.db import SqliteDatabase
File "/usr/lib/python3/dist-packages/CommandNotFound/db/db.py", line 5, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Traceback (most recent call last):
File "/home/eric/test_scripts/test3.py", line 13, in <module>
print(r.text)
BrokenPipeError: [Errno 32] Broken pipe
@Philippe - answer to your next comment
$ sudo python3 test3.py | jq . parse error: Invalid numeric literal at line 2, column 10 Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'> BrokenPipeError: [Errno 32] Broken pipe
Please let me know if you have a solution. Thank you!
CodePudding user response:
Running requests 2.26.0 on macOS 12.0.1 and Python 3.9.9 I discovered that the website requires Accept-Encoding in the headers. This works as expected for me:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15',
'Accept': 'application/json',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate, br'
}
with requests.Session() as session:
(r := session.get('https://www.overstock.com/api/product.json?prod_id=10897789', headers=headers)).raise_for_status()
print(r.json())
CodePudding user response:
It was all due to being IP blocked.
Here is ultimately the script that saved the day...
import requests
url = "https://www.overstock.com/api/product.json?prod_id=10897789"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15',
'Accept': 'application/json',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate, br'
}
http_proxy = "http://ip:port"
https_proxy = "http://ip:port"
proxyDict = {
"http" : http_proxy,
"https" : https_proxy
}
r = requests.get(url, headers=headers, proxies=proxyDict)
result = r.json()
print(result)
Thank you all for the group effort!
After seeing this worked for @JCaesar, @diggusbickus, @balmy, and @Philippe I realized that the only remaining stone unturned was the ip address. By adding rotating residential proxy IPs, I made the request and got the data immediately.
Thanks to @JCaesar for revealing 'Accept-Encoding'
for without that, it would not work at all. Thank you to @diggusbickus for your comment of walrus notation :=
for without that I would have assumed Python 3.9.x was running and upgrading that.