I'm trying to download images form tumblr using java selenium. I extracted the url of the image from src and tried to download images from url. But the images saved not what I expected. Those are in unsupported formats and smaller in size. How can I correct this? Please help.
This is my code:
public static void main(String[] args) throws InterruptedException, AWTException, IOException {
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://artist-childe-hassam.tumblr.com/");
Thread.sleep(5000);
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_END);
robot.keyRelease(KeyEvent.VK_END);
List<WebElement> list = driver.findElements(By.xpath("//img[@alt]"));
int count;
count = 1;
for (WebElement element : list) {
String srcs = element.getAttribute("src");
String attribute = element.getAttribute("alt");
System.out.println("title: " attribute);
System.out.println(" ");
System.out.println("link " srcs);
URL url = new URL(srcs);
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream("path" count ".jpg");
count ;
fos.write(response);
fos.close();
}
}
}
CodePudding user response:
I don't have idea on BufferedInputStream to download image from URL. Instead, I used to use curl to download from URL. I have modified your code and its working fine for me.
public static void main(String[] args) throws InterruptedException, AWTException, IOException {
WebDriverManager.chromedriver().setup();
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://artist-childe-hassam.tumblr.com/");
Thread.sleep(5000);
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_END);
robot.keyRelease(KeyEvent.VK_END);
List<WebElement> list = driver.findElements(By.xpath("//img[@alt]"));
int count;
count = 1;
for (WebElement element : list) {
String srcs = element.getAttribute("src");
String attribute = element.getAttribute("alt");
System.out.println("title: " attribute);
System.out.println(" ");
System.out.println("link " srcs);
downloadFromUrl(srcs,"Path" count ".jpg",Duration.ofSeconds(20));
count ;
}
}
public static boolean downloadFromUrl(String url, String fileNameWithPath, Duration timeoutDuration) {
try {
if(timeoutDuration == null) {
timeoutDuration = Duration.ofMinutes(5);
}
String curlStr = "curl " url " --output " fileNameWithPath;
Process process = Runtime.getRuntime().exec(curlStr);
long totalSeconds = 0;
System.out.println("Downloading file to " fileNameWithPath " ...");
while(process.isAlive()) {
Thread.sleep(1000);
totalSeconds ;
if(totalSeconds > timeoutDuration.getSeconds()) {
throw new Exception("Unable to download file even after 5 mins of wait");
}
}
System.out.println(fileNameWithPath " got downloaded in seconds - " totalSeconds);
return true;
} catch (Exception ex) {
ex.printStackTrace();
}
return false;
}