Hi i'm trying to build a manga downloader app, for this reason I'm scraping several sites, however I have a problem once I get the image URL. I can see the image using my browser (chrome), I can also download it, however I can't do the same using any popular scripting library.
Here is what I've tried:
String imgSrc = "https://cdn.mangaeden.com/mangasimg/aa/aa75d306397d1d11d07d66746dae78a36dc78672ae9e97a08cb7abb4.jpg"
Connection.Response resultImageResponse = Jsoup.connect(imgSrc)
.userAgent(
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.referrer("none").execute();
// output here
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(new java.io.File(String.valueOf(imgPath))));
out.write(resultImageResponse.body()); // resultImageResponse.body() is where the image's contents are.
out.close();
I've also tried this:
URL imgUrl = new URL(imgSrc);
Files.copy(imgUrl.openStream(), imgPath);
Lastly, since I was sure the link works I've tried to download the image using python, but also in this case I get a 403 error
import requests
base_url = "https://cdn.mangaeden.com/mangasimg/d0/d08f07d762acda8a1f004677ab2414b9766a616e20bd92de4e2e44f1.jpg"
res = requests.get(url)
googling I found this Unable to get image url in Mangaeden API Angular 6 which seems really close to my problem, however I don't understand if I'm setting wrong the referrer or it doesn't work at all...
Do you have any tips? Thank you!
CodePudding user response:
How to fix?
Add some "headers" to your request to show that you might be a "browser", this will give you a 200
as response and you can save the file.
Note This will also work for postman, just overwrite the hidden user agent and you will get the image as response
Example (python)
import requests
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
url = "https://cdn.mangaeden.com/mangasimg/d0/d08f07d762acda8a1f004677ab2414b9766a616e20bd92de4e2e44f1.jpg"
res = requests.get(url,headers=headers)
with open("image.jpg", 'wb') as f:
f.write(res.content)