Home > Software engineering >  Alternative to ParallelForEach That can allow me to kill parallel processes immediately on Applicati
Alternative to ParallelForEach That can allow me to kill parallel processes immediately on Applicati

Time:10-25

I am doing a simple console application that loads files from a database into a hashset. These files are then processed in a parallel foreach loop. This console application does launch a new Process object for each files it needs to process. So it opens new console windows with the application running. I am doing it this way because of logging issues I have if I run parsing from within the application where logs from different threads write into each other.

The issue is, when I do close the application, the parallel foreach loop still tries to process one more file before exiting. I want all tasks to stop immediately when I kill the application. Here is code excerpts:

My cancel is borrowed from: Capture console exit C#

Essentially the program performs some cleanup duties when it receives a cancel command such as CTRL C or closing window with X button

The code I am trying to cancel is here:

class Program
{
   
    private static bool _isFileLoadingDone;
    static ConcurrentDictionary<int, Tuple<Tdx2KlarfParserProcInfo, string>> _currentProcessesConcurrentDict = new ConcurrentDictionary<int, Tuple<Tdx2KlarfParserProcInfo, string>>();

    static void Main(string[] args)
    {
        try
        {
            if (args.Length == 0)
            {
                // Some boilerplate to react to close window event, CTRL-C, kill, etc
                LaunchFolderMode();       

            }

        }
    }

   
}

Which calls:

private static void LaunchFolderMode()
{
    //Some function launched from Task
    ParseFilesUntilEmpty();
}

And this calls:

private static void ParseFilesUntilEmpty()
{
    while (!_isFileLoadingDone)
    {
        ParseFiles();
    }
    
    ParseFiles();

}

Which calls:

private static void ParseFiles()
{
    filesToProcess = new HashSet<string>(){@"file1", "file2", "file3", "file4"} //I actuall get files from a db. this just for example
    //_fileStack = new ConcurrentStack<string>(filesToProcess);
    int parallelCount = 2
    Parallel.ForEach(filesToProcess, new ParallelOptions { MaxDegreeOfParallelism = parallelCount },
        tdxFile =>{
            ConfigureAndStartProcess(tdxFile);
        });
    
}

Which finally calls:

public static void ConfigureAndStartProcess(object fileName)
{
    string fileFullPath = fileName.ToString();
    Process proc = new Process();
    string fileFullPathArg1 = fileFullPath;
    string appName = @".\TDXXMLParser.exe";
    if (fileFullPathArg1.Contains(".gz"))
    {
        StartExe(appName, proc, fileFullPathArg1);  //I set up the arguments and launch the exes. And add the processes to _currentProcessesConcurrentDict
        proc.WaitForExit();
        _currentProcessesConcurrentDict.TryRemove(proc.Id, out Tuple<Tdx2KlarfParserProcInfo, string> procFileTypePair);
        proc.Dispose();
    }

}

The concurrent dictionary to monitor processes uses the following class in the tuple:

public class Tdx2KlarfParserProcInfo
{
    public int ProcId { get; set; }
    public List<long> MemoryAtIntervalList { get; set; } = new List<long>();
}

For the sake of how long these code excerpts are, I omitted the 'StartExe()' function. All it does is set up arguments and starts the process. Is there a better way parallel processing method I can use which will allow me to kill whatever files I am currently processing without immediately tryign to start a new process. Which the parallel.Foreach does?

I have tried killing it with Parallel State Stop method but it still tries to process one more file

CodePudding user response:

You could invoke the ParseFiles on the ThreadPool, so that the main thread does nothing else than wait for the CTRL-C event.

Just replace this:

ParseFiles();

...with this:

ThreadPool.QueueUserWorkItem(_ => ParseFiles());

The Parallel.ForEach invokes the delegate on the current thread and the ThreadPool (by default), so since the current thread will be a ThreadPool thread, all the work will happen on the ThreadPool. The ThreadPool threads are background threads, so they will be aborted as soon as all the foreground threads have terminated. And in you case only the main thread is foreground thread.

CodePudding user response:

Unless I'm mistaking, your code seems to do no work on its own, it just launches executables and waits for them to end. And yet you're starving your thread pool on code that's just sitting there waiting for the external processes to end. Now, again if I understand correctly, this part works. It's very wasteful and utterly non-scalable, but it works.

The only thing you seem to be missing is closing the processes early when your own process ends. This is rather trivial: CancellationToken. You simply create a CancellationTokenSource in your main function and pass it down to every worker object, and when your program is meant to end you set it. That only leaves you to respond to it, and that's as easy as replacing your proc.WaitForExit(); with something like

// this is how we coded in .Net 1.0, released in Feb. 2002. 
while(!proc.HasExited && !ct.IsCancellationRequested)
    Thread.Sleep(1000);
if(ct.IsCancellationRequested)
    proc.Kill();

Now, if you also want to fix your first problem, start writing async code. Process.WaitForExitAsync(CancellationToken) returns an awaitable task that you can await with a cancellation token, so the work is done for you. Stop using Parallel.ForEach, this isn't the 90s, you have Task.WhenAll to do the collection. And at the end of all this, you'll see that your code will boil down to perhaps 10 good lines of code, instead of the mess you made for yourself.

  • Related