The purpose of weak references - Part I
The terminology "weak and strong references" is commonly used in the context of automatic reference counting. In the context of manual memory management, we talk about ownership, and owning and non-owning references.
While the terminology is different, those two different kinds of references represent the same concept, regardless of the underlying memory management model. Strong references are the equivalent of owning references, and weak references are the equivalent of non-owning references.
Understanding those relations can help us understand the purpose of weak references in automatic reference counting, and instead of thinking about them merely in the context of breaking reference cycles, we should think about them as non-owning references—in other words, additional references to an object instance that don't participate in its memory management, and could be invalidated after the object is released through other code and owning references.
Because non-owning (weak) references can become invalid, they are inherently dangerous to use. If you access such a reference after the object is released, you will be accessing invalid data and crash the application.
If they are dangerous, then why we are using them at all? What is their purpose? It would be much better if we could avoid writing such hazardous code.
Just like any other tool in the toolbox, non-owning or weak references can be
dangerous if misused or not used properly. Just like a nil
value can be
dangerous, they can be dangerous, too. But just like it is impossible to write
a non-trivial application that never uses nil
as a valid reference value, it is
similarly impossible to write such an application without using non-owning or weak
references.
Yes, we should always avoid writing more dangerous code than is necessary, but we should also use it when it fits the purpose. Avoiding more dangerous code at all costs, can lead to unnecessary complex or convoluted code that is often slower than more simpler, yet seemingly more dangerous code. If the code is clean, simple, easy to follow and understand, then it will actually be safer than the more complicated code that avoids dangerous code for no good reason.
Manual memory management
Under manual memory management, the developer has the responsibility of manually releasing all constructed object instances, or delegating that process to another entity (commonly another object instance).
To do that, we need to keep a reference to the constructed object instance. The reference to the object instance that we use to free the object is called the owning reference. Any other reference we might have to that object will be a non-owning reference, and non-owning references must not be used to free the object.
In some scenarios, we can have more that one owning reference to a single object instance, but in such cases, we also need some mechanism that will notify us when one of those references is used to free the object, to avoid releasing the object multiple times.
We can also transfer ownership from the original reference to some other entity that will be responsible for releasing the object from that point on. In such cases, the original reference will no longer be the owning reference after the ownership transfer is complete.
The following examples show the basic principles of owning and non-owning references and ownership transfer.
Owning reference:
procedure Foo;
var
Obj: TObject; // owning reference
begin
Obj := TObject.Create;
try
...
finally
Obj.Free;
end;
end;
Owning and non-owning references:
procedure Foo;
var
Obj: TObject; // owning reference
ObjRef: TObject; // non-owning reference, must not be used to free object
begin
Obj := TObject.Create;
try
ObjRef := Obj;
...
finally
Obj.Free;
end;
end;
Ownership transfer:
procedure Foo(List: TObjectList);
var
Obj: TObject; // temporary owning reference
begin
Obj := TObject.Create;
try
List.Add(Obj); // ownership is transferred to List
except
Obj.Free; // in case of exception, ownership transfer is not completed
raise; // and we need to release the object in that case
end;
end;
Shared ownership:
As previously mentioned, manual memory management requires having only one
owning reference, unless there is an additional mechanism involved in preventing
a double free under particular circumstances. TComponent
and its descendants
have such a mechanism built in.
One common coding pattern is the following:
procedure TMainForm.OnButtonClick(Sender: TObject);
var
Frm: TMyForm;
begin
Frm := TMyForm.Create(Application);
try
...
finally
// even if we don't call Free here, the constructed form will still eventually
// be released through its owner - the Application object instance
Frm.Free;
end;
end;
In that example, a TMyForm
instance is constructed with an owner—the application
object. Application
takes ownership of the constructed object, and it will
automatically release all owned object instances during its destruction process.
This is how all TComponent
-based classes behave. If they are constructed with
an owner, that owner component will be responsible for releasing them. If they own
other components, those will be released when that particular object instance is
destroyed.
But the above example releases the TMyForm
instance through its non-owning
reference, Frm
. Generally, such code would result in a double free and crash,
but in the case of TComponent
descendants, that does not happen because each
component with an owner will also remove itself from the owner's component list
during its destruction process, preventing a double free.
In this case, we basically have two owning references to an object instance, but its destruction process is coordinated through an inner class mechanism that allows such shared ownership.
However, that works only if one of the owning references is stored in another component's list of children. If we have two plain references to a component, there is no connection between the two to prevent a double free:
procedure TMainForm.OnButtonClick(Sender: TObject);
var
Frm1, Frm2: TMyForm;
begin
Frm1 := TMyForm.Create(Application);
try
Frm2 := Frm1;
...
finally
Frm1.Free;
Frm2.Free; // this will cause a double free
end;
end;
The above example is a bit contrived, but by taking multiple references to object instances in more complicated code, we can easily cause a double free, or another common problem—dangling references.
Non-owning references
In the above examples, we have seen what a non-owning reference technically looks like, by taking an additional reference to some object instance, but we have not seen a real-life example where doing so actually makes sense and it is not merely bad code.
One common pattern is having a parent-child relationship between object instances:
type
TParent = class;
TChild = class
protected
Parent: TParent; // non-owning reference
public
constructor Create(AParent: TParent);
destructor Destroy; override;
end;
TParent = class
public
Child: TChild; // owning reference
constructor Create;
destructor Destroy; override;
end;
constructor TChild.Create(AParent: TParent);
begin
Parent := AParent;
end;
destructor TChild.Destroy;
begin
inherited;
end;
constructor TParent.Create;
begin
Child := TChild.Create(Self);
end;
destructor TParent.Destroy;
begin
Child.Free;
inherited;
end;
The parent owns child object instance, but child does not own its parent, even though it stores the reference to the parent. In such a parent-child relationship, we don't need to worry about dangling references, because the parent reference will never became a dangling pointer, as the parent always lives longer than its child object.
This code does not support changing the parent of a child instance, but this is also
possible, it just needs a bit more housekeeping code. We would need to nil
the
child reference from its parent to prevent a double free if we decide to break
their relationship.
For simplicity, in the above example, the parent has only a single child, and they belong to different classes. More often, the parent and child class are the same, and the parent will have the ability to hold two or more children instances. This pattern is the basis of various tree-like structures.
If you take a look at the TComponent
code, you will find that pattern there—each
component will have an owner (which can also be nil
, in which case the component
is the root component, and needs to be manually managed), and it can hold multiple
child components. A component that does not have any children is a leaf node.
You can argue that having a reference to a parent (owner) is not strictly necessary, as you can always find a particular child by traversing the structure through the root reference. However, such an approach is not always viable, as it can be too costly performance-wise, and sometimes hard to implement algorithmically if you don't have direct access to the root instance in the code scope where you are handling a tree node.
And if you store the root reference in each child, they you still have the same problem with holding a non-owning or weak reference, and you have just made the process of accessing the immediate parent more convoluted.
Another form of non-owning references is when you have two independent object instances that have some functional dependence outside an ownership relation.
An example of such non-owning relationship can be having an edit control which has an associated popup menu. The edit control will hold a non-owning reference to a menu, allowing invoking a menu when user right-clicks on the edit.
Now, the edit could be an owner of the menu instance, and we could avoid having a non-owning reference to a menu, but such a framework design would not be very efficient.
What if you have multiple edit controls that could share the same popup menu?
If an edit control takes a non-owning reference to a menu, then you can easily share that popup menu across different controls. On the other hand, if the edit is an owner of the popup menu, then for each edit you would need to construct a separate menu instance. Not very efficient, is it?
The problem with this kind of non-owning references where each object is independent
from the other, is that if we decide to free the popup menu for some reason, we
need to be able to notify the interested edit or other controls that the menu is no
longer available. In such a scenario, the control will just set the menu reference to
nil
, and that will be an indicator that there is no popup menu functionality
associated with the control.
TComponent
solves this problem by implementing a notification mechanism. When
you associate a menu with an edit control, the menu reference will be stored in the edit, but
the edit will also be added to the menu component's notification list. That way, the menu
can notify all registered components during its destruction process. However,
now you have a situation where menu holds a reference to the edit control, too. To
prevent accessing a dangling pointer, the edit needs to add the menu to its component
notification list. If the edit gets released before the menu, the menu will be able
to remove the edit reference from the notification list, and avoid problems with
accessing a destroyed instance.
If you use TComponent
descendants, you can use its notification system to
solve issues with dangling pointers when you need to create an association between
two components. If you need to store non-owning references between object
instances that don't belong to the TComponent
class hierarchy, you would need to
implement some similar mechanism to avoid a dangling pointer problem.
Under manual memory management, every reference to an object is the same, as there are no additional designators that will tell us whether a reference is an owning or non-owning one, and how we should deal with it. We can only determine that by inspecting the code logic. If the code logic is flawed, we will have issues with managing memory.
Because we don't need to explicitly mark object references as owning (strong) or non-owning (weak), it may seem like we don't need to think about their categorization under manual memory management, but in reality while we are writing or reading code, we can always tell which reference is strong and which ones are weak. If we cannot tell that, then we have a serious problem with that particular code.
Additionally, if we take a non-owning reference to a object, where that reference
can turn into a dangling pointer, we need to use some mechanism to avoid creating
dangling pointers. In other words, we need to be able to nil
such a reference
when the object is no longer valid, and that nil
value will be an indicator that
the particular functionality is not available.
Excellent insight into weak references.
ReplyDeleteGreat article.
ReplyDelete