Home > Software engineering >  Decompressing a S3 File in Stream using C#
Decompressing a S3 File in Stream using C#

Time:03-22

I'm trying to read a .zip file from S3 into a stream in C# and write the entries back to to originating folder in S3. I've looked at the myriad of SO questions, watched video, etc trying to get this right and I seem to be missing something. I'm now farther than I was originally, but I'm still getting stuck. (I really wish Amazon would just implement a decompress method because this seems to come up a lot but no such luck yet.) Here is my code currently:

private async Task<string> DecompressFile(string bucketName, string keystring)
{
    AmazonS3Client client = new AmazonS3Client();
    Stream fileStream = new MemoryStream();
    string sourceDir = keystring.Split('/')[0];
    GetObjectRequest request = new GetObjectRequest{ BucketName = bucketName, Key = keystring };
    try
    {
        using (var response = await client.GetObjectAsync(request))
        using (var arch = new ZipArchive(response.ResponseStream))
        {
            foreach (ZipArchiveEntry entry in arch.Entries)
            {
                fileStream = entry.Open();
                string newFile = sourceDir   "/"   entry.FullName;
                using (Amazon.S3.Transfer.TransferUtility tranute = new Amazon.S3.Transfer.TransferUtility(client))
                {
                    var upld = new Amazon.S3.Transfer.TransferUtilityUploadRequest();
                    upld.InputStream = fileStream;
                    upld.Key = newFile;
                    upld.BucketName = bucketName;
                    await tranute.UploadAsync(upld);
                }
            }
        }
        return $"Decompression complete for {keystring}...";
    } 
    catch (Exception e)
    {
        ctxt.Logger.LogInformation($"Error decompressing file {keystring} from bucket {bucketName}. Please check the file and try again.");
        ctxt.Logger.LogInformation(e.Message);
        ctxt.Logger.LogInformation(e.StackTrace);
        throw;
    }
}

The error I keep hitting now is on the write process at await tranute.UploadAsync(upld). The error I'm getting is:

$exception  {"This operation is not supported."}    System.NotSupportedException

Here are the exception details:

System.NotSupportedException
  HResult=0x80131515
  Message=This operation is not supported.
  Source=System.IO.Compression
  StackTrace:
   at System.IO.Compression.DeflateStream.get_Length()
   at Amazon.S3.Transfer.TransferUtilityUploadRequest.get_ContentLength()
   at Amazon.S3.Transfer.TransferUtility.IsMultipartUpload(TransferUtilityUploadRequest request)
   at Amazon.S3.Transfer.TransferUtility.GetUploadCommand(TransferUtilityUploadRequest request, SemaphoreSlim asyncThrottler)
   at Amazon.S3.Transfer.TransferUtility.UploadAsync(TransferUtilityUploadRequest request, CancellationToken cancellationToken)
   at File_Ingestion.Function.<DecompressFile>d__13.MoveNext() in File-Ingestion\Function.cs:line 136

  This exception was originally thrown at this call stack:
    [External Code]
    File_Ingestion.Function.DecompressFile(string, string) in Function.cs

Any help would be greatly appreciated. Thanks!

CodePudding user response:

I think the problem is that AWS needs to know the length of the file before it can be uploaded, but the stream returned by ZipArchiveEntry.Open doesn't know its length upfront.

See how the exception is thrown when TransferUtilityUploadRequest.ContentLength tries to call DeflateStream.Length (which always throws), where DeflateStream is ultimately the thing returned from ZipArchiveEntry.Open.

(It's slightly odd that DeflateStream doesn't report its own decompressed length. It certainly knows what it should be, but that's only an indication which might be wrong, so maybe it wants to avoid reporting a value which might be incorrect.)

I think that what you need to do is to buffer the extracted file in memory, before passing it to AWS. This way, we can find out the uncompressed length of the stream, and this will be correctly reported by MemoryStream.Length:

using var fileStream = entry.Open();
// Copy the fileStream into an in-memory MemoryStream
using var ms = new MemoryStream();
fileStream.CopyTo(ms);
ms.Position = 0;

string newFile = sourceDir   "/"   entry.FullName;
using (Amazon.S3.Transfer.TransferUtility tranute = new Amazon.S3.Transfer.TransferUtility(client))
{
    var upld = new Amazon.S3.Transfer.TransferUtilityUploadRequest();
    upld.InputStream = ms;
    upld.Key = newFile;
    upld.BucketName = bucketName;
    await tranute.UploadAsync(upld);
}
  • Related