I am requesting the data from some kind of Products API, but the thing is that I am getting it 20 by 20. So the endpoint looks like this:
https://www.someDummyAPI.com/Api/Products?offset=0&count=20
Note: I can't change the count, it will always be 20.
I.e. The data from this endpoint will contain 20 record, from 0 to 20 and after that I have to increase offset by 20 to get next 20 record and so on (totally it's about 1500 record so I have to make approximately 700 request ).
After getting all the data I am inserting it into the SQL database using stored procedure (this is different process).
So my question is, how can I speed up the fetching process, I thought about running tasks in parallel but I need to get results from the response.
For now this process looks like this :
protected async void FSL_Sync_btn_Click(object sender, EventArgs e)
{
int offset = 0;
int total= 0;
bool isFirst = true;
DataTable resTbl = CreateDt();
while (offset < total || offset == 0)
{
try
{
var data = await GetFSLData(offset.ToString(),"Products");
JObject Jresult = JObject.Parse(data);
if (isFirst)
{
Int32.TryParse(Jresult.SelectToken("total").ToString(),out total);
isFirst = false;
}
// Function to chain up data in DataTable
resTbl = WriteInDataTable(resTbl, Jresult);
offset = 20;
}
catch(Exception ex)
{
var msg = ex.Message;
}
}
}
So the process flow I am taking is:
- Get data from API (let's say first 20 record).
- Add it two existing
DataTable
usingWriteInDataTable
function. - Insert data into SQL Database from this
resTbl
Datatable
(completely different process, not shown in this screenshot).
I haven't used parallel tasks yet (don't even know if it's a correct solution for it), so would appreciate any help.
CodePudding user response:
You could use Task.WhenAll
to run your requests in parallel.
public async Task<IEnumerable<string>> GetDataInParallel()
{
var tasks = new List<Task<string>>();
while(...)
{
var dataTask = GetFastLaneData(offset.ToString(), "Products"); // does not launch request, only add task to a list
tasks.Add(dataTask);
offset = 20
}
var datas = await Task.WhenAll(tasks); // launch all tasks
return datas;
}
This method will try to create or utilize 1000 threads and manage them, which might be harmful to performance, but will be significantly faster than launching them in order. You might consider batching them to achieve even better performance and launch like 100 tasks at a time.
CodePudding user response:
Get your first record and set the total first before the loop:
var data = await GetFSLData(offset.ToString(),"Products");
JObject Jresult = JObject.Parse(data);
Int32.TryParse(Jresult.SelectToken("total").ToString(),out total);
In the next step you can then parallelize your tasks:
DataTable resTbl = CreateDt();
var downloadTasks = new List<Task>();
while (offset < total)
{
downloadTasks.Add(GetFSLData(offset.ToString(),"Products"));
offset = 20;
}
Then you can use Task.WhenAll
to get the data
var httpResults = await Task.WhenAll(downloadTasks);
foreach (var jObjectResult in httpResults.Select(JObject.Parse))
{
resTbl = WriteInDataTable(resTbl, Jresult);
}
Just some things to be aware of: You will be hitting that api with a lot of requests simultaneously, and it might not be a good idea. You could use TransformBlock
and ActionBlock
in the TPL dataflow library if you run into this problem. You can find more information on that here:
https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library
CodePudding user response:
It's quite hard to know what you're really using and getting, due to the high abstraction level in your code (which is IMHO good, but quite hard to spot errors on a page like SO).
So here is just a sketch on how you can parallelize all requests to your API to improve the fetch time and write the results once into the database. Maybe there are some quotas on the API and you maybe have to run these things in chunks, but this can easily be adopted through LINQ.
var httpClient = new HttpClient();
var requests = Enumerable.Range(0, 1500)
.Where(i => i % 20 == 0)
// Create all needed requests
.Select(offset => $"https://www.someDummyAPI.com/Api/Products?offset={offset}&count=20")
.Select(url => new HttpRequestMessage(HttpMethod.Get, url))
// Create tasks to call these requests
.Select(request => httpClient.SendAsync(request));
// Run all of these requests in parallel.
var responses = await Task.WhenAll(requests);
// Create all tasks to get the content out of the requests
var allContentStreams = responses
.Select(response => response.Content.ReadAsStringAsync());
// Retrieve all content bodies as strings
var allRawContents = await Task.WhenAll(allContentStreams);
// Serialize strings into some usable object
var allData = allRawContents
.Select(JsonConvert.DeserializeObject<MyDataDTO>);
// Add all objects to the database context.
foreach (var data in allData)
{
WriteIntoDatabase(data);
}
// Let context persist data into database.
SaveDatabase();
CodePudding user response:
If you have upgraded to the .NET 6 platform, you could consider using the Parallel.ForEachAsync
method to parallelize the GetFSLData
invocations. This method requires an IEnumerable<T>
sequence as source. You can create this sequence using LINQ (the Enumerable.Range
method). To avoid any problems associated with the thread-safety of the DataTable
class, you can store the JObject
results in an intermediate ConcurrentQueue<JObject>
collection, and defer the creation of the DataTable
until all the data have been fetched and are locally available. You may need to also store the offset
associated with each JObject
, so that the results can be inserted in their original order. Putting everything together:
protected async void FSL_Sync_btn_Click(object sender, EventArgs e)
{
int total = Int32.MaxValue;
IEnumerable<int> offsets = Enumerable
.Range(0, Int32.MaxValue)
.Select(n => checked(n * 20))
.TakeWhile(offset => offset < Volatile.Read(ref total));
var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
var results = new ConcurrentQueue<(int Offset, JObject JResult)>();
await Parallel.ForEachAsync(offsets, options, async (offset, ct) =>
{
string data = await GetFSLData(offset.ToString(), "Products");
JObject Jresult = JObject.Parse(data);
if (offset == 0)
{
Volatile.Write(ref total,
Int32.Parse(Jresult.SelectToken("total").ToString()));
}
results.Enqueue((offset, Jresult));
});
DataTable resTbl = CreateDt();
foreach (var (offset, Jresult) in results.OrderBy(e => e.Offset))
{
resTbl = WriteInDataTable(resTbl, Jresult);
}
}
The Volatile.Read
/Volatile.Write
are required because the total
variable might be accessed by multiple threads in parallel.
In order to get optimal performance, you may need to adjust the MaxDegreeOfParallelism
configuration, according to the capabilities of the remote server and your internet connection.
Note: This solution is not efficient memory-wise, because it requires that all data are stored in memory in two different formats at the same time.