Home > Net >  Is there an archive file format that supports being split in multiple parts and can be unpacked nati
Is there an archive file format that supports being split in multiple parts and can be unpacked nati

Time:03-29

Some archive file formats, e.g. ZIP (see Section 8 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT), support being split into multiple parts of a limited size. ZIP files can be opened natively on recent versions of Microsoft Windows, but it seems that Windows cannot open split ZIP files natively, only with special tools like 7-Zip. I would like to use this "split archive" functionality in a web app that I'm writing in which the created archives should be opened by a large audience of "average" computer users, so my question is: Is there an archive file format (like ZIP) that supports being split in multiple parts and can be unpacked without installing additional software on recent versions of Microsoft Windows? And ideally on other widely used operating systems as well.

Background: My final goal is to export a directory structure that is split over multiple web servers to a single local directory tree. My current idea is to have each web server produce one part of the split archive, provide all of them as some sort of Javascript multi-file download and then have one archive (in multiple parts) on the user's computer that just needs to be unpacked. An alternative idea for this final goal was to use Javascript's File System Access API (https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API), but it is not supported on Firefox, which is a showstopper.

CodePudding user response:

CAB archives meet this purpose a bit (see this library's page for example, it says that through it, archives can even be extracted directly from a HTTP(S)/FTP server). Since the library relies on .NET, it could even be used on Linux through Mono/Wine, which is a crucial part if your servers aren't running Windows... Because archive must be created on server, right?.

Your major problem is more that a split archive can't be created in parallel on multiple servers, at least because of LZx's dictionnary. Each server should create the whole set of archives and send only the ones it should send, and you don't have ANY guarantee that all these archives' sets would be identical on each server.

Best way is probably to create the whole archive on ONE server, then distribute each part (or the whole splitted archive...) on your various servers, through a replication-like interface.

Otherwise, you can also make individual archives that contains only a subset of the directory tree (you'll have to partition the files across servers), but it won't meet your requirements since it would be a collection of individual archives, and not a big splitted archive.

Some precisions may be required:

  • Do you absolutely need a system without any client besides the browser? Or can you use other protocols, as long as they natively exist on Windows (like FTP / SSH client that are now provided by default)?
  • What is the real purpose behind this request? Distribute load across all servers? Or to avoid too big single file downloads (i.e. a 30 GB archive) in case of transfer failure? Or both?
  • In case of a file size problem, why don't rely on resuming download?
  • Related