I'm trying to send many http requests async and then collect the responses. To do this, I'm mainly using HtmlAgilityPack and SemaphoreSlim.
Unfortunately I'm facing performance issue which I'm struggling to solve.
When I add 1000 tasks to the list with SemaphoreSlim initCount = 15, I get all of them done in about 40 s.
When I add 2000 tasks, result goes linear and it takes about 80s to finish all of the tasks.
BUT when I start 2 console apps at the same time and put 1000 tasks in each, with Semaphore initCount=15 I get different result as it takes around 60 s to finish 1000 tasks from each app. That would mean I have just finished 2000 tasks in 60s.
How could I obtain this performance within usage of just 1 app, how to scale the performance up?
Please see code below:
Main:
using appParser.Services;
using HtmlAgilityPack;
using System.Diagnostics;
internal class Program
{
static int j;
static int k;
static long sum=0;
static HtmlWeb web = new HtmlWeb();
static async Task Main(string[] args)
{
web.UserAgent = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Mobile Safari/537.36";
List<Task> tasks = new List<Task>();
List<HtmlDocument> htmldocs = new();
var stopwatch = new Stopwatch();
Console.WriteLine("Put request number");
k = Int32.Parse(Console.ReadLine());
Console.WriteLine("Put initCount");
j = Int32.Parse(Console.ReadLine());
var PerformanceTest = new PerfrormanceTestSemaphore(j);
Console.WriteLine("-------------------------------------------------------------");
Console.WriteLine($"InitCount: {j}");
stopwatch.Start();
for (int i = 0; i < k; i )
{
tasks.Add(PerformanceTest.LoadDocs("https://m.olx.pl/elektronika/gry-konsole/q-xbox/?search[order]=created_at:desc", web));
}
stopwatch.Stop();
Console.WriteLine($"Tasks added in {stopwatch.ElapsedMilliseconds} ms");
stopwatch.Restart();
while (tasks.Count > 0)
{
stopwatch.Start();
Task finished = await Task.WhenAny(tasks);
htmldocs.Add(((Task<HtmlDocument>)finished).Result);
tasks.Remove(finished);
stopwatch.Stop();
Console.WriteLine($"{tasks.Count} left, last finished in {stopwatch.ElapsedMilliseconds} ms");
sum = stopwatch.ElapsedMilliseconds;
stopwatch.Restart();
}
Console.WriteLine("-------------------------------------------------------------");
Console.WriteLine($"Tasks finished in {sum} ms");
Console.WriteLine("-------------------------------------------------------------");
await File.WriteAllTextAsync("html.txt", htmldocs[k-1].Text.ToString());
Console.ReadLine();
PerfrormanceTestSemaphore Class:
using HtmlAgilityPack;
namespace appParser.Services
{
internal class PerfrormanceTestSemaphore
{
public int SemaphoreNum { get; set; }
private SemaphoreSlim _mutex;
CancellationTokenSource cts = new();
public PerfrormanceTestSemaphore(int semaphoreNum)
{
SemaphoreNum = semaphoreNum;
_mutex = new SemaphoreSlim(semaphoreNum);
}
public async Task<HtmlDocument> LoadDocs(string url, HtmlWeb web)
{
await _mutex.WaitAsync(cts.Token);
try
{
string loadUrl = url "&view=list/full_page=True";
return await web.LoadFromWebAsync(url);
}
finally
{
_mutex.Release();
}
}
}
}
CodePudding user response:
SemaphoreSlim
doesn't work across processes, which explains why your two application test is faster overall: there's less contention, by a factor of half. In other words your two tests don't test the same thing.
Edit: That said, if this is all you're doing, you don't need your semaphore at all. Simply create your tasks to download the link and use Task.WhenAll
to collect the results. Let the framework figure out limits and work units.
CodePudding user response:
Blindy,
Thank you for provided answer. Unfortunetaly, I have already tested approach you are describing and performance was worse than with SemaphoreSlim, that's why I wanted to continue work around it. Do you have any ideas why this code might run slower than with SemaphoreSlim? Maybe I could do something more efficiently?
Main:
using appParser.Services;
using HtmlAgilityPack;
using System.Diagnostics;
namespace MyApp // Note: actual namespace depends on the project name.
{
internal class Program
{
static int j;
static int k;
static long sum=0;
static HtmlWeb web = new HtmlWeb();
static PerfromanceTestNew PerfromanceTestNew = new();
static async Task Main(string[] args)
{
web.UserAgent = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Mobile Safari/537.36";
List<Task> tasks = new List<Task>();
List<HtmlDocument> htmldocs = new();
var stopwatch = new Stopwatch();
Console.WriteLine("Put http requests number");
k = Int32.Parse(Console.ReadLine());
stopwatch.Start();
for (int i = 0; i < k; i )
{
tasks.Add(PerfromanceTestNew.LoadDocs("https://m.olx.pl/elektronika/gry-konsole/q-xbox/?search[order]=created_at:desc", web));
}
stopwatch.Stop();
Console.WriteLine($"Tasks added in {stopwatch.ElapsedMilliseconds}");
stopwatch.Restart();
stopwatch.Start();
await Task.WhenAll(tasks);
stopwatch.Stop();
foreach (Task task in tasks)
{
htmldocs.Add(((Task<HtmlDocument>)task).Result);
}
Console.WriteLine($"Tasks finished in {stopwatch.ElapsedMilliseconds}");
}
}
}
PerfromanceTestNew Class:
using HtmlAgilityPack;
namespace appParser.Services
{
internal class PerfromanceTestNew
{
public async Task<HtmlDocument> LoadDocs(string url, HtmlWeb web)
{
string loadUrl = url "&view=list/full_page=True";
return await web.LoadFromWebAsync(url);
}
}
}