I want to download some file for example sitemap.xml.gz.
I want to do it only with playwright 1.22.
I tried to do it with chromium browser, but it fails.
Also it doesn't work with webkit. With webkit it opens all file content on the page and gives me timeout.
It only works with firefox.
But I want to know that is wrong with others browsers? Maybe it is some bug in playwright.
Has anyone been able to download directly a file with playwright?
public class PwDownload {
public static void main(String[] args) {
try (Playwright playwright = Playwright.create()) {
final BrowserType chromium = playwright.chromium();
final Browser browser = chromium.launch(new BrowserType.LaunchOptions().setHeadless(false));
Page page = browser.newPage();
Download download = page.waitForDownload(() -> {
page.navigate("https://www.fnac.es/sitemap-top-post.xml.gz");
});
System.out.println(download.path());
browser.close();
}
}
}
Error trace with chromium:
navigating to "https://www.fnac.es/sitemap-top-post.xml.gz", waiting until "load"
============================================================
at FrameSession._navigate (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/chromium/crPage.js:636:35)
at runNextTicks (node:internal/process/task_queues:61:5)
at processImmediate (node:internal/timers:437:9)
at async /private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/frames.js:648:30
at async ProgressController.run (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/progress.js:101:22)
at async FrameDispatcher.goto (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/frameDispatcher.js:86:59)
at async DispatcherConnection.dispatch (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/dispatcher.js:352:22)
}
at com.microsoft.playwright.impl.Connection.dispatch(Connection.java:183)
at com.microsoft.playwright.impl.Connection.processOneMessage(Connection.java:163)
at com.microsoft.playwright.impl.ChannelOwner.runUntil(ChannelOwner.java:101)
... 19 more
CodePudding user response:
Works for Chromium and Firefox. Change outputDirectory
variable before running.
import com.microsoft.playwright.*;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;
import java.io.File;
import java.nio.file.Path;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) throws Exception {
try (Playwright playwright = Playwright.create()) {
String outputDirectory = "d:\\";
String url = "https://www.johnlewis.com/sitemap/products/products-00.xml.gz";
String filename = FilenameUtils.getName(url);
BrowserType browserType = playwright.firefox();
Browser browser = browserType.launch(new BrowserType.LaunchOptions().setHeadless(false));
BrowserContext newContext = browser.newContext(new Browser.NewContextOptions().setAcceptDownloads(true));
Page page = newContext.newPage();
Download download = page.waitForDownload(() -> {
page.evaluate("(y) => {location.href = y;}", url);
});
Path downloadedFilePath = download.path();
System.out.println("Downloaded to " downloadedFilePath);
Path destinationFilePath = Paths.get(outputDirectory, filename);
FileUtils.copyFile(new File(downloadedFilePath.toString()), new File(destinationFilePath.toString()));
System.out.println("Saved to " destinationFilePath);
}
}
}
As for webkit
, I guess there is some kind of built in browser functionality you cannot override. You can even try to open webkit
using Playwright's java code and then insert a link manually and try to download - it doesn't allow you to do this (even in separate window, or even using javascript download
html attribute)
CodePudding user response:
If you already have a link to your file, then why you don't just download it via a normal connection? Download file from HTTPS server using Java
Playwright Downloads are like downloads triggered from a page or by a user. Just visiting a link isn't going to trigger a download event.