Skip First Row (CSV Header Row) of HttpResponseMessage Content.ReadAsStream-CodePudding

Below is a simplified example of a larger piece of code. Basically I'm calling one or more API endpoints and downloading a CSV file that gets written to an Azure Blob Container. If there's multiple files, the blob is appended for every new csv file loaded.

The issue is when I append the target blob I ended up with a multiple header rows scattered throughout the file depending on how may CSVs I consumed. All the CSVs have the same header row and I know the first row will always have a line feed. Is there a way to read the stream, skip the content until after the first line feed and then copy the stream to the blob?

It seemed simple in my head, but I'm having trouble finding my way there code-wise. I don't want to wait for the whole file to download and then in-memory delete the header row since some of these files can be several gigabytes.

I am using .net core v6 if that helps

using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
    for (int i = 0; i < 3; i  )
    {
        using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);

        Stream sourceStream = response.Content.ReadAsStream();
        sourceStream.CopyTo(blobStream);
    }
}

CodePudding user response：

.CopyTo copies from the current position in the stream. So all you need to do is throw away all the characters until you have thrown away the first CR or Line Feed.

using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
    for (int i = 0; i < 3; i  )
    {
        using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);

        Stream sourceStream = response.Content.ReadAsStream();

        if (i != 0)
        {
            char c;
            do { c = (char)sourceStream.ReadByte(); } while (c != '\n');
        }
        sourceStream.CopyTo(blobStream);
    }
}

If all the files always have the same size header row, you can come up with a constant for its length. That way you could just skip the stream to the exact correct location like this:

using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
    for (int i = 0; i < 3; i  )
    {
        using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);

        Stream sourceStream = response.Content.ReadAsStream();
        if (i != 0)
            sourceStream.Seek(HeaderSizeInBytes, SeekOrigin.Begin);
        sourceStream.CopyTo(blobStream);
    }
}

This will be slightly quicker but does have the downside that the files can't change format easily in the future.

P.S. You probably want to Dispose sourceStream. Either directly or by wrapping its creation in a using statement.

CodePudding user response：

If we can assume that stream contains UTF 8 encoded text then you can do the following:

Create a streamReader against sourceStream

var reader = new StreamReader(sourceStream);

Read the first line (assumed the lines ends with \n)

var header = reader.ReadLine();

Convert the first line a \n to byte array

var headerInBytes = Encoding.UTF8.GetBytes(header   Environment.NewLine);

Set the position after the first line

sourceStream.Position = headerInBytes.Length;

Copy the source stream from the desired position

sourceStream.CopyTo(blobStream);

This proposed solution is just an example, depending on the actual stream content you might need to further adjust it and make it more robust.