I am having the issue of fetching all forms currently hosted in hubspot.
I tried with a simple for loop where I made one request at the time, and fetched one form at a time, which worked, but was very slow.
I then thought it might work better if I created a seperate task for each request, and then made the task create each request, and store them in one common list.
Problem is that I expect the list to have 2000 items, but I never seem to get that, it seem pretty inconsistent the number of items I get?
But how come?
This is how I have setup my for fetching scheme.
private static async Task<IEnumerable<HubspotModel>> GetForms(
string hubspotPath, int pageSize)
{
int totalResults;
int offset = 0;
List<HubspotModel> output = new();
List<Task> tasks = new();
using var client = new HttpClient();
{
System.Net.Http.Headers.HttpResponseHeaders requestHeader = client
.GetAsync($"https://api.hubapi.com{hubspotPath}?"
$"hapikey={HubspotConfiguration.ApiKey}&limit={1}&offset={0}")
.Result.Headers;
totalResults = int.Parse(requestHeader.GetValues("x-total-count").First());
do
{
tasks.Add(Task.Run(() =>
{
int scopedOffset = offset;
IEnumerable<HubspotModel> forms = GetFormsFromHubspot(hubspotPath,
pageSize, offset, client);
output.AddRange(forms);
}).ContinueWith(requestReponse =>
{
if (requestReponse.IsFaulted)
{
Console.WriteLine("it failed");
}
}));
offset = pageSize;
}
while (totalResults > offset);
await Task.WhenAll(tasks);
}
return output;
}
private static IEnumerable<HubspotModel> GetFormsFromHubspot(string hubspotPath,
int pageSize, int offset, HttpClient client)
{
HttpResponseMessage request = client
.GetAsync($"https://api.hubapi.com{hubspotPath}?"
$"hapikey={HubspotConfiguration.ApiKey}&limit={pageSize}&offset={offset}")
.Result;
request.EnsureSuccessStatusCode();
string content = request.Content.ReadAsStringAsync().Result;
IEnumerable<Dictionary<string, object>> jsonResponse = JsonSerializer
.Deserialize<IEnumerable<Dictionary<string, object>>>(content,
new JsonSerializerOptions() { });
var guid = Guid.Parse(jsonResponse.First()["guid"].ToString());
var forms = jsonResponse.Select(x => new HubspotModel()
{
id = Guid.Parse(x["guid"].ToString()),
FormName = x["name"].ToString(),
Form = x
});
return forms;
}
CodePudding user response:
First of all, I'd suggest to make GetFormsFromHotspot
an async
as well and use await client.GetAsync( ...)
and await request.Content.ReadAsStringAsync()
instead of client.GetAsync(...).Result
and ReadAsStringAsync().Result
respectively, because using .Result
will block the current thread and thus, you will throw away the advantages of async Tasks.
But the main cause of the problem should be the following
GetFormsFromHubspot(hubspotPath, pageSize, offset, client);
Here you are calling the GetFormsFromHubspot
with an offset
parameter from an outer scope (and that value keeps changing), thus it will not use the value it had when you created that task but it uses the value it actually has, when that particular part of the code is really executed. So the value that is used as an offset is quite random. You already tried to create a
int scopedOffset = offset;
but you don't use it. And also you create it at the wrong position. Create that scopedOffset
outside of the task, but inside the loop's body. So it will be created at the creationtime of the task. And because it's inside the loop's body, a new value will be created for each task.
The following should do the trick (after you refactor GetFormsFromHubspot
to be async.
do {
int scopedOffset = offset
tasks.Add(Task.Run(async () => {
IEnumerable<HubspotModel> forms = await GetFormsFromHubspot(hubspotPath, pageSize, scopedOffset, client);
output.AddRange(forms);
})
.ContinueWith(...);
);
offset = pageSize;
} while (totalResults > offset);
CodePudding user response:
The main problem with your code is that the List<T>
is not thread-safe. When multiple threads are Add
ing to a list concurrently without synchronization, its behavior becomes undefined (throws exceptions, becomes corrupted etc). There are many ways to solve this problem:
- Synchronize the access to the list with the
lock
statement:lock (output) output.AddRange(forms);
. - Use a concurrent collection instead of the
List<T>
, for example aConcurrentQueue<T>
. - Avoid collecting manually the output altogether. Instead of storing your tasks in a
List<Task>
, you can store them in aList<Task<HubspotModel[]>>
, meaning that each task will be a genericTask<TResult>
, with theTResult
being an array ofHubspotModel
instances. Finally you will get all the output at once, when youawait
theTask.WhenAll
.
Below is an implementation of the third idea. Notice that I have avoided creating a HttpClient
instance, because the recommendation is to instantiated this class only once, and reuse it throughout the life of the application.
private static async Task<HubspotModel[]> GetFormsAsync(HttpClient client,
string hubspotPath, int pageSize)
{
string url = $"https://api.hubapi.com{hubspotPath}?hapikey="
$"{HubspotConfiguration.ApiKey}&limit={1}&offset={0}";
HttpResponseMessage response = await client.GetAsync(url)
.ConfigureAwait(false);
response.EnsureSuccessStatusCode();
int totalCount = Int32.Parse(response.Headers
.GetValues("x-total-count").First());
List<int> offsets = new();
for (int offset = 0; offset < totalCount; offset = pageSize)
offsets.Add(offset);
Task<HubspotModel[]>[] tasks = offsets.Select(offset => Task.Run(async () =>
{
HubspotModel[] forms = await GetFormsAsync(client,
hubspotPath, pageSize, offset).ConfigureAwait(false);
return forms;
})).ToArray();
HubspotModel[][] results = await Task.WhenAll(tasks).ConfigureAwait(false);
return results.SelectMany(x => x).ToArray();
}
private async static Task<HubspotModel[]> GetFormsAsync(HttpClient client,
string hubspotPath, int pageSize, int offset)
{
string url = $"https://api.hubapi.com{hubspotPath}?hapikey="
$"{HubspotConfiguration.ApiKey}&limit={pageSize}&offset={offset}";
HttpResponseMessage response = await client.GetAsync(url)
.ConfigureAwait(false);
response.EnsureSuccessStatusCode();
string content = await response.Content.ReadAsStringAsync()
.ConfigureAwait(false);
IEnumerable<Dictionary<string, object>> jsonResponse = JsonSerializer
.Deserialize<IEnumerable<Dictionary<string, object>>>(content,
new JsonSerializerOptions() { });
Guid guid = Guid.Parse(jsonResponse.First()["guid"].ToString());
HubspotModel[] forms = jsonResponse.Select(x => new HubspotModel()
{
Id = Guid.Parse(x["guid"].ToString()),
FormName = x["name"].ToString(),
Form = x
}).ToArray();
return forms;
}
One more improvement that you could consider doing is to switch from the Task.WhenAll
to the new (.NET 6) API Parallel.ForEachAsync
. The advantage is that you'll get control over the degree of parallelism, and so you'll be able to reduce the parallelization in case the remote server can't keep up with the pressure. Unfortunatelly the Parallel.ForEachAsync
method does not return the results like the Task.WhenAll
, so you'll be back to your original problem. You can find a solution about this here: ForEachAsync with Result.