I am writing a program that reads a large amount of dynamic data. I would like to use multi-threading to speed this process up. I understand that it is up to the operating system on how to handle created threads (thread priority, etc), which I believe is the reason for my unexpected results, however, I still do not know a solution.
Code:
int multiplier = rowCount / 20;
Debug.WriteLine("Row count is " rowCount "...");
Debug.WriteLine("Using 20 threads to complete job...");
Debug.WriteLine("Using " multiplier " as multiplier...");
for (int i = 1; i <= 20; i ) {
new Thread(() => {
int startRow = ((multiplier * i) - multiplier) 1;
int endRow = multiplier * i;
if (i == 20) {
endRow = rowCount;
}
Debug.WriteLine(" [THREAD " i "] start row: " startRow ", end row : " endRow);
for (int row = startRow; row <= endRow; row ) {
for (int column = 1; column <= columnCount; column ) {
//...data is read here
}
}
}).Start();
}
Actual result (it appears that my issue is the child thread not "reading" the 'i' variable "correctly", which makes sense considering how threads work, I just do not know how to fix it):
Row count is 2209...
Using 20 threads to complete job...
Using 110 as multiplier...
[THREAD 4] start row: 331, end row : 440
[THREAD 4] start row: 331, end row : 440
[THREAD 5] start row: 441, end row : 550
[THREAD 5] start row: 441, end row : 550
[THREAD 6] start row: 551, end row : 660
[THREAD 7] start row: 661, end row : 770
[THREAD 9] start row: 881, end row : 990
[THREAD 9] start row: 881, end row : 990
[THREAD 11] start row: 1101, end row : 1210
[THREAD 11] start row: 1101, end row : 1210
[THREAD 12] start row: 1211, end row : 1320
[THREAD 14] start row: 1431, end row : 1540
[THREAD 15] start row: 1541, end row : 1650
[THREAD 15] start row: 1541, end row : 1650
[THREAD 16] start row: 1651, end row : 1760
[THREAD 17] start row: 1761, end row : 1870
[THREAD 19] start row: 1981, end row : 2090
[THREAD 20] start row: 2091, end row : 2209
[THREAD 20] start row: 2091, end row : 2209
[THREAD 21] start row: 2201, end row : 2310
Expected result (this is the result of simply not using threads, meaning commenting out the lambda expression):
Row count is 2209...
Using 20 threads to complete job...
Using 110 as multiplier...
[THREAD 1] start row: 1, end row : 110
[THREAD 2] start row: 111, end row : 220
[THREAD 3] start row: 221, end row : 330
[THREAD 4] start row: 331, end row : 440
[THREAD 5] start row: 441, end row : 550
[THREAD 6] start row: 551, end row : 660
[THREAD 7] start row: 661, end row : 770
[THREAD 8] start row: 771, end row : 880
[THREAD 9] start row: 881, end row : 990
[THREAD 10] start row: 991, end row : 1100
[THREAD 11] start row: 1101, end row : 1210
[THREAD 12] start row: 1211, end row : 1320
[THREAD 13] start row: 1321, end row : 1430
[THREAD 14] start row: 1431, end row : 1540
[THREAD 15] start row: 1541, end row : 1650
[THREAD 16] start row: 1651, end row : 1760
[THREAD 17] start row: 1761, end row : 1870
[THREAD 18] start row: 1871, end row : 1980
[THREAD 19] start row: 1981, end row : 2090
[THREAD 20] start row: 2091, end row : 2209
CodePudding user response:
Try moving the variables that can be calculated outside of the thread to avoid reading the shared variable i
in the thread. The threads are started without care of the surrounding loop which increments i
.
int multiplier = rowCount / 20;
Debug.WriteLine("Row count is " rowCount "...");
Debug.WriteLine("Using 20 threads to complete job...");
Debug.WriteLine("Using " multiplier " as multiplier...");
for (int i = 1; i <= 20; i ) {
int startRow = ((multiplier * i) - multiplier) 1;
int endRow = multiplier * i;
if (i == 20) {
endRow = rowCount;
}
int nThread = i;
new Thread(() => {
Debug.WriteLine(" [THREAD " nThread "] start row: " startRow ", end row : " endRow);
for (int row = startRow; row <= endRow; row ) {
for (int column = 1; column <= columnCount; column ) {
//...data is read here
}
}
}).Start();
}
It is apparent in the first log you shared, for example when thread 1 and 2 starts, i
is already at 4.
CodePudding user response:
When you reference a variable defined outside of the scope of your lambda function compiler creates so called closure i.e. your lambda function effectively uses a reference to the variable, not the value of the variable it had when the lambda was created.
In your example each thread has a reference to a variable int i
and not the value of this variable at the moment when each thread was created. In fact this peculiar behaviour can be demonstrated even without using threads. Using slightly modified example from Eric Lippert blog:
var funcs = new List<Func<int>>();
for(int i = 0; i < 10; i )
{
funcs.Add(() => i);
}
foreach(var f in funcs)
{
Console.WriteLine(f());
}
This code, perhaps surprisingly for some, will not output numbers from 0
to 9
but will output 10 10
's.
In fact before C# 5 foreach
loop had exactly the same behaviour (as demonstrated by the original example in the above blog post) but apparently this behaviour was so surprising for many people that in C# 5 it was changed so now foreach
variable is logically inside the loop, and therefore closure is closed over a fresh copy of the variable each time. Which is not the case for for
loop and in fact for any variable defined outside lambda scope, i.e. the variables lambda is closed over.