Beware of loops and tasks
Loops and anonymous methods don't go hand in hand. Any variable that changes for each iteration will cause problems if it is used within an anonymous method. I explained the mechanism behind variable capturing in the post Mysterious Case Of Wrong Value, along with the problem it causes with loops and its solution.
Commonly, anonymous methods are used as callbacks or event handlers, where there are no loops in the context where variable capture happens. While there is a possibility to have an unexpected—wrong—value in such scenarios, it is rather easy to debug them and understand the order in which some code is called, and why the captured value is wrong in a given place.
With the rise of multi-threading usage, and especially the Parallel Programming Library and tasks, writing anonymous methods inside a loop becomes more common, and it is much easier to write code where a captured variable will have the wrong value. The biggest problem with such code is that it will be harder to debug and understand the cause if you are not already familiar with it.
The mechanism behind variable capture with and without loops is the same, but the loop itself is what makes the difference, as explained in the linked post.
The typical code pattern that causes the issue is running multiple tasks to perform some work:
procedure ProcessItems(Items: array of TItem);
var
i: Integer;
begin
for i := 0 to High(Items) do
begin
TTask.Run(
procedure
begin
Items[i].Process;
end);
end;
end;
The problem with the above code is that it uses and therefore captures the loop
index within the anonymous method. When a particular task runs, that index value
will be the current value at the moment when the anonymous method runs, not the
value held at the time the task itself is created by the call to TTask.Run
. So
in reality when you need to process 10 items, you will maybe process items at
indexes 5 and 9, not items on all indexes from 0 to 9.
Slightly modified code that uses a temporary Item
variable will not resolve the
problem:
procedure ProcessItems(Items: array of TItem);
var
i: Integer;
Item: TItem;
begin
for i := 0 to High(Items) do
begin
Item := Items[i];
TTask.Run(
procedure
begin
Item.Process;
end);
end;
end;
Now the code does not capture the loop index variable that obviously changes with
each iteration. But, the captured Item
variable also changes with each
iteration. Because there is only a single Item
variable (location) captured, the
actual content of Item
when the task runs will again be the current value stored
in Item
variable at the time when task code is executed, not the one when the
task itself was created. So instead of processing 10 different items, that code
will process only a few of them.
The same would happen if you used any other kind of loop. For instance:
procedure ProcessItems(Items: array of TItem);
var
i: Integer;
Item: TItem;
begin
for Item in Items do
begin
TTask.Run(
procedure
begin
Item.Process;
end);
end;
end;
Because the problem is the same as the one I wrote about in the previous post, the solution will also be the same. Adding a separate method that will contain the anonymous method declaration, and passing all captured variables that change during the iteration as parameters to that method (you will also need to pass other captured variables, if they are otherwise inaccessible from the extracted code):
procedure ProcessItem(Item: TItem);
begin
TTask.Run(
procedure
begin
Item.Process;
end);
end;
procedure ProcessItems(Items: array of TItem);
var
i: Integer;
begin
for i := 0 to High(Items) do
begin
ProcessItem(Items[i]);
end;
end;
Fortunately, this kind of problem can be rather easily spotted in code. The only rule you need to remember is: Don't write anonymous methods directly within loops.
If you have a loop containing an anonymous method, you must extract that anonymous method and put it in separate procedure or method.
Thanks for this insight, Dalija. Always fun to read your postings.
ReplyDeleteThanks!
DeleteDalija you explain stuff real good. Thank you for sharing.
ReplyDeleteThank you for reading.
DeleteWouldn't inline declared variables (in the for loop) fix the issue?
ReplyDeleteNo, inline variables would not fix the problem. Inline variable inside the loop is still just single variable - memory location, not multiple ones.
DeleteWell, I just stumbled across this issue, and your post just saved me a considerable amount of head-scratching, so thanks for that.
ReplyDelete