Home > Back-end >  What is the difference between HttpClient.GetStreamAsync() and HttpClient.GetAsync()
What is the difference between HttpClient.GetStreamAsync() and HttpClient.GetAsync()

Time:01-03

so basically I want to learn a bit of webscraping, and I was excepting that these code-blocks returns the same HTML (as a string).

var result = await client.GetAsync("https://www.footlocker.at/de/product/nike-dunk-high-damenschuhe/315347028102.html");
var stream = result.Content.ReadAsStringAsync().Result;
Console.WriteLine(stream);
var result2 = await client.GetStreamAsync("https://www.footlocker.at/de/product/nike-dunk-high-damenschuhe/315347028102.html");
var stream2 = new StreamReader(result2).ReadToEndAsync().Result;
Console.WriteLine(stream2);

In this type of example, GetStreamAsync() returns the whole html of the website and GetAsync returns also an HTML but it's just an error site. Why is that so? Am I not understanding the difference between these Methods, or just doing something complety wrong?

I tried to look for answers on stackoverflow and read through the documentation, but unfortunately didn't find any specific answer for this example.

CodePudding user response:

As explained by @RichardDeeming, HttpClient.GetStreamAsync call HttpClient.GetAsync with HttpCompletionOption.ResponseHeadersRead.

The code can be rewritten as follows:

async Task Main()
{   
    HttpClient httpclient = new HttpClient();
    var imageUrl = "https://tenlives.com.au/wp-content/uploads/2020/09/Found-Kitten-0-8-Weeks-Busy-scaled.jpg";
    var downloadTasks = Enumerable.Range(0, 15)
        .Select(async u =>
        {
            try
            {
                //Option 1) - this will fail
                var response = await httpclient.GetAsync(imageUrl, HttpCompletionOption.ResponseHeadersRead);
                //End Option 1)

                //Option 2) - this will succeed
                //var response = await httpclient.GetAsync(imageUrl, HttpCompletionOption.ResponseContentRead);
                //End Option 2)

                response.EnsureSuccessStatusCode();
                var stream = await response.Content.ReadAsStreamAsync();
                return stream;
            }
            catch (Exception e)
            {
                Console.WriteLine($"Error downloading image");
                throw;
            }
        }).ToList();

    try
    {
        await Task.WhenAll(downloadTasks);
    }
    catch (Exception e)
    {       
        Console.WriteLine("================ Failed to download one or more image "   e.Message);
    }
    Console.WriteLine($"Successful downloads: {downloadTasks.Where(t => t.Status == TaskStatus.RanToCompletion).Count()}");

With HttpClient.GetAsync (or SendAsync(HttpCompletionOption.ResponseContentRead)), the received content in a network buffer is read and integrated in a local buffer.

I'm not sure about the network buffer, but I think a buffer somewhere (network card, OS, HttpClient, ???) is full and blocks new responses.

You can correct the code by correctly managing this buffer, for example by disposing the associated streams :

var downloadTasks = Enumerable.Range(0, 15)
.Select(async u =>
{
    try
    {
        var stream = await httpclient.GetStreamAsync(imageUrl);
        stream.Dispose(); //Free buffer
        return stream;
    }
    catch (Exception e)
    {
        Console.WriteLine($"Error downloading image");
        throw;
    }
}).ToList();

These two example help you to find difference.

  • Related