Home > Net >  Asynchronously download and compile list of JsonDocument
Asynchronously download and compile list of JsonDocument

Time:01-10

I'm a little new (returning after a couple of decades) to C# and to the async/await model of programming. Looking for a little guidance, since I received an understandable warning CS1998 that the asynchronous method lacks await and operators and will run synchronously.

The code below I think is straightforward - the server API returns data in pages of 25 items. I'm using a continuation to add each page of 25 to a List of JsonDocuments. Calling code will handle the parsing as needed. I'm not sure how I could reasonably leverage anything further in this, but looking for any suggestions/guidance.

internal static async Task<List<JsonDocument>> Get_All_Data(HttpClient client, string endpoint)
{
    Console.WriteLine("Downloading all data from {0}{1}", client.BaseAddress, endpoint);
    var all_pages = new List<JsonDocument>();

    // Get first page to determine total number of pages
    HttpResponseMessage response = client.GetAsync(endpoint).Result;
    Console.WriteLine("Initial download complete - parsing headers to determine total pages");
    //int items_per_page; 
    if (int.TryParse(Get_Header_Value("X-Per-Page", response.Headers), out int items_per_page) == false)
        //     throw new Exception("Response missing X-Per-Page in header");
        items_per_page = 25;


    if (int.TryParse(Get_Header_Value("X-Total-Count", response.Headers), out int total_items) == false)
        //throw new Exception("Response missing X-Total-Count in header");
        total_items = 1;

    // Divsion returns number of complete pages, add 1 for partial IF total items_json is not an exact multiple of items_per_page
    var total_pages = total_items / items_per_page;
    if ((total_items % items_per_page) != 0) total_pages  ;

    Console.WriteLine("{0} pages to be downloaded", total_pages);

    var http_tasks = new Task[total_pages];
    for (int i = 1; i <= total_pages; i  )
    {
        Console.WriteLine("Downloading page {0}", i);
        var paged_endpoint = endpoint   "?page="   i;
        response = client.GetAsync(paged_endpoint).Result;
        http_tasks[i - 1] = response.Content.ReadAsStringAsync().ContinueWith((_content) => { all_pages.Add(JsonDocument.Parse(_content.Result)); }); ;
        //http_tasks[i].ContinueWith((_content) => { all_pages.Add(JsonDocument.Parse_List(_content.Result)); });
    }
    System.Threading.Tasks.Task.WaitAll(http_tasks);  // wait for all of the downloads and parsing to complete

    return all_pages;
}

Thanks for your help

CodePudding user response:

My suggestion is to await all asynchronous operations, and use the Parallel.ForEachAsync method to parallelize the downloading of the JSON documents, while maintaining control of the degree of parallelism:

static async Task<JsonDocument[]> GetAllData(HttpClient client, string endpoint)
{
    HttpResponseMessage response = await client.GetAsync(endpoint);
    response.EnsureSuccessStatusCode();

    if (!Int32.TryParse(GetHeaderValue(response, "X-Total-Count"),
        out int totalItems) || totalItems < 0)
            totalItems = 1;

    if (!Int32.TryParse(GetHeaderValue(response, "X-Per-Page"),
        out int itemsPerPage) || itemsPerPage < 1)
            itemsPerPage = 25;

    int totalPages = ((totalItems - 1) / itemsPerPage)   1;
    
    JsonDocument[] results = new JsonDocument[totalPages];
    ParallelOptions options = new() { MaxDegreeOfParallelism = 5 };
    
    await Parallel.ForEachAsync(Enumerable.Range(1, totalPages), options,
        async (page, ct) =>
    {
        string pageEndpoint = endpoint   "?page="   page;
        HttpResponseMessage pageResponse = await client
            .GetAsync(pageEndpoint, ct);
        pageResponse.EnsureSuccessStatusCode();
        string pageContent = await response.Content.ReadAsStringAsync(ct);
        JsonDocument result = JsonDocument.Parse(pageContent);
        results[page - 1] = result;
    });
    return results;
}

static string GetHeaderValue(HttpResponseMessage response, string name)
    => response.Headers.TryGetValues(name, out var values) ?
        values.FirstOrDefault() : null;

The MaxDegreeOfParallelism is configured to the value 5 for demonstration purposes. You can find the optimal degree of parallelism by experimenting with your API. Setting the value too low might result in mediocre performance. Setting the value too high might overburden the target server, and potentially trigger an anti-DoS-attack mechanism.

If you are not familiar with the Enumerable.Range, it is a LINQ method that returns an incremented numeric sequence of integers that starts from start, and contains count elements.

The GetAllData is an asynchronous method and it is supposed to be awaited. If you are calling it without await, and your application is a UI application like WinForms or WPF, you are at risk of experiencing a deadlock. Don't panic, it happens consistently, and you'll observe it during the testing. One way to prevent it is to append .ConfigureAwait(false) to all awaited operations inside the GetAllData method.

  • Related