Get byte array from .zip file in memory, without writing anything to disk-CodePudding

I'm trying to take an existing .zip file (I get it from an api as byte array), take some files from it, and make a new .zip file. Then return that new .zip file as byte array. The catch is that all of this should be done in memory, without writing any actual files to disk.

So I have this setup. First memory stream and ziparchive is used for the orignal zip file. Second stream and archive represents the second zip that will only contained filtered items from the first zip.

        using var downloadedRepoStream = new MemoryStream(binaryRepo);
        using var downloadedRepoArchive = new ZipArchive(downloadedRepoStream, ZipArchiveMode.Read, true);

        var entriesToExtract = downloadedRepoArchive.Entries; //add some filtering

        using var subRepoStream = new MemoryStream();
        using var subRepoArchive = new ZipArchive(subRepoStream, ZipArchiveMode.Create, true);

Then I go through each entry and copy contents over to the second zip.

        foreach (var entry in entriesToExtract)
        {
            var entryName = entry.FullName.Split('/', 2)[1];
            using var entryStream = entry.Open();

            var subRepoEntry = subRepoArchive.CreateEntry(entry.FullName);
            using var subRepoEntryStream = subRepoEntry.Open();

            await entryStream.CopyToAsync(subRepoEntryStream);
        }

Finally, I need to return that second ziparchive as byte array. I assumed I that I could just call MemoryStream.ToArray(), but the MemoryStream, used by the second ZipArchive, remains empty, despite ZipArchive having files. So how do I get byte[] from ZipArchive?

CodePudding user response：

I was able to get the following to work with minor tweaks. You appear to be missing the step where you set the position you wish to start reading from: subRepoStream.Seek(0, SeekOrigin.Begin);

byte[] binaryRepo = ...
using MemoryStream downloadedRepoStream = new MemoryStream(binaryRepo);
using ZipArchive downloadedRepoArchive = new ZipArchive(downloadedRepoStream, ZipArchiveMode.Read, false);

ReadOnlyCollection<ZipArchiveEntry> entriesToExtract = downloadedRepoArchive.Entries; //add some filtering

using MemoryStream subRepoStream = new MemoryStream();
using ZipArchive subRepoArchive = new ZipArchive(subRepoStream, ZipArchiveMode.Create, false);

foreach (ZipArchiveEntry entry in entriesToExtract)
{
    using Stream entryStream = entry.Open();
    
    ZipArchiveEntry subRepoEntry = subRepoArchive.CreateEntry(entry.FullName);
    using Stream subRepoEntryStream = subRepoEntry.Open();
    
    await entryStream.CopyToAsync(subRepoEntryStream);
}

// before reading the stream you need to set the position to the beginning.
subRepoStream.Seek(0, SeekOrigin.Begin);

byte[] data = subRepoStream.ToArray();

CodePudding user response：

The problem is that you are not closing the new ZipArchive before looking at the stream. You can do this simply by using the old style using block

using var downloadedRepoStream = new MemoryStream(binaryRepo);
using var downloadedRepoArchive = new ZipArchive(downloadedRepoStream, ZipArchiveMode.Read, true);

var entriesToExtract = downloadedRepoArchive.Entries; //add some filtering

using var subRepoStream = new MemoryStream();

using (var subRepoArchive = new ZipArchive(subRepoStream, ZipArchiveMode.Create, true))
{
    foreach (var entry in entriesToExtract)
    {
        var entryName = entry.FullName.Split('/', 2)[1];
        using var entryStream = entry.Open();

        var subRepoEntry = subRepoArchive.CreateEntry(entry.FullName);
        using var subRepoEntryStream = subRepoEntry.Open();

        await entryStream.CopyToAsync(subRepoEntryStream);
    }
}

subRepoStream.Position = 0;

Note that async doesn't make a huge amount of sense when using MemoryStream as it's all sync anyway.

It's a shame there is no rename facility on ZipArchiveEntry.