The purpose of weak references - Part I

The terminology "weak and strong references" is commonly used in the context of automatic reference counting. In the context of manual memory management, we talk about ownership, and owning and non-owning references.

While the terminology is different, those two different kinds of references represent the same concept, regardless of the underlying memory management model. Strong references are the equivalent of owning references, and weak references are the equivalent of non-owning references.

Understanding those relations can help us understand the purpose of weak references in automatic reference counting, and instead of thinking about them merely in the context of breaking reference cycles, we should think about them as non-owning references—in other words, additional references to an object instance that don't participate in its memory management, and could be invalidated after the object is released through other code and owning references.

Because non-owning (weak) references can become invalid, they are inherently dangerous to use. If you access such a reference after the object is released, you will be accessing invalid data and crash the application.

If they are dangerous, then why we are using them at all? What is their purpose? It would be much better if we could avoid writing such hazardous code.

Just like any other tool in the toolbox, non-owning or weak references can be dangerous if misused or not used properly. Just like a nil value can be dangerous, they can be dangerous, too. But just like it is impossible to write a non-trivial application that never uses nil as a valid reference value, it is similarly impossible to write such an application without using non-owning or weak references.

Yes, we should always avoid writing more dangerous code than is necessary, but we should also use it when it fits the purpose. Avoiding more dangerous code at all costs, can lead to unnecessary complex or convoluted code that is often slower than more simpler, yet seemingly more dangerous code. If the code is clean, simple, easy to follow and understand, then it will actually be safer than the more complicated code that avoids dangerous code for no good reason.

Manual memory management

Under manual memory management, the developer has the responsibility of manually releasing all constructed object instances, or delegating that process to another entity (commonly another object instance).

To do that, we need to keep a reference to the constructed object instance. The reference to the object instance that we use to free the object is called the owning reference. Any other reference we might have to that object will be a non-owning reference, and non-owning references must not be used to free the object.

In some scenarios, we can have more that one owning reference to a single object instance, but in such cases, we also need some mechanism that will notify us when one of those references is used to free the object, to avoid releasing the object multiple times.

We can also transfer ownership from the original reference to some other entity that will be responsible for releasing the object from that point on. In such cases, the original reference will no longer be the owning reference after the ownership transfer is complete.

The following examples show the basic principles of owning and non-owning references and ownership transfer.

Owning reference:

procedure Foo; var Obj: TObject; // owning reference begin Obj := TObject.Create; try ... finally Obj.Free; end; end;

Owning and non-owning references:

procedure Foo; var Obj: TObject; // owning reference ObjRef: TObject; // non-owning reference, must not be used to free object begin Obj := TObject.Create; try ObjRef := Obj; ... finally Obj.Free; end; end;

Ownership transfer:

procedure Foo(List: TObjectList); var Obj: TObject; // temporary owning reference begin Obj := TObject.Create; try List.Add(Obj); // ownership is transferred to List except Obj.Free; // in case of exception, ownership transfer is not completed raise; // and we need to release the object in that case end; end;

Shared ownership:

As previously mentioned, manual memory management requires having only one owning reference, unless there is an additional mechanism involved in preventing a double free under particular circumstances. TComponent and its descendants have such a mechanism built in.

One common coding pattern is the following:

procedure TMainForm.OnButtonClick(Sender: TObject); var Frm: TMyForm; begin Frm := TMyForm.Create(Application); try ... finally // even if we don't call Free here, the constructed form will still eventually // be released through its owner - the Application object instance Frm.Free; end; end;

In that example, a TMyForm instance is constructed with an owner—the application object. Application takes ownership of the constructed object, and it will automatically release all owned object instances during its destruction process. This is how all TComponent-based classes behave. If they are constructed with an owner, that owner component will be responsible for releasing them. If they own other components, those will be released when that particular object instance is destroyed.

But the above example releases the TMyForm instance through its non-owning reference, Frm. Generally, such code would result in a double free and crash, but in the case of TComponent descendants, that does not happen because each component with an owner will also remove itself from the owner's component list during its destruction process, preventing a double free.

In this case, we basically have two owning references to an object instance, but its destruction process is coordinated through an inner class mechanism that allows such shared ownership.

However, that works only if one of the owning references is stored in another component's list of children. If we have two plain references to a component, there is no connection between the two to prevent a double free:

procedure TMainForm.OnButtonClick(Sender: TObject); var Frm1, Frm2: TMyForm; begin Frm1 := TMyForm.Create(Application); try Frm2 := Frm1; ... finally Frm1.Free; Frm2.Free; // this will cause a double free end; end;

The above example is a bit contrived, but by taking multiple references to object instances in more complicated code, we can easily cause a double free, or another common problem—dangling references.

Non-owning references

In the above examples, we have seen what a non-owning reference technically looks like, by taking an additional reference to some object instance, but we have not seen a real-life example where doing so actually makes sense and it is not merely bad code.

One common pattern is having a parent-child relationship between object instances:

type TParent = class; TChild = class protected Parent: TParent; // non-owning reference public constructor Create(AParent: TParent); destructor Destroy; override; end; TParent = class public Child: TChild; // owning reference constructor Create; destructor Destroy; override; end; constructor TChild.Create(AParent: TParent); begin Parent := AParent; end; destructor TChild.Destroy; begin inherited; end; constructor TParent.Create; begin Child := TChild.Create(Self); end; destructor TParent.Destroy; begin Child.Free; inherited; end;

The parent owns child object instance, but child does not own its parent, even though it stores the reference to the parent. In such a parent-child relationship, we don't need to worry about dangling references, because the parent reference will never became a dangling pointer, as the parent always lives longer than its child object.

This code does not support changing the parent of a child instance, but this is also possible, it just needs a bit more housekeeping code. We would need to nil the child reference from its parent to prevent a double free if we decide to break their relationship.

For simplicity, in the above example, the parent has only a single child, and they belong to different classes. More often, the parent and child class are the same, and the parent will have the ability to hold two or more children instances. This pattern is the basis of various tree-like structures.

If you take a look at the TComponent code, you will find that pattern there—each component will have an owner (which can also be nil, in which case the component is the root component, and needs to be manually managed), and it can hold multiple child components. A component that does not have any children is a leaf node.

You can argue that having a reference to a parent (owner) is not strictly necessary, as you can always find a particular child by traversing the structure through the root reference. However, such an approach is not always viable, as it can be too costly performance-wise, and sometimes hard to implement algorithmically if you don't have direct access to the root instance in the code scope where you are handling a tree node.

And if you store the root reference in each child, they you still have the same problem with holding a non-owning or weak reference, and you have just made the process of accessing the immediate parent more convoluted.

Another form of non-owning references is when you have two independent object instances that have some functional dependence outside an ownership relation.

An example of such non-owning relationship can be having an edit control which has an associated popup menu. The edit control will hold a non-owning reference to a menu, allowing invoking a menu when user right-clicks on the edit.

Now, the edit could be an owner of the menu instance, and we could avoid having a non-owning reference to a menu, but such a framework design would not be very efficient.

What if you have multiple edit controls that could share the same popup menu?

If an edit control takes a non-owning reference to a menu, then you can easily share that popup menu across different controls. On the other hand, if the edit is an owner of the popup menu, then for each edit you would need to construct a separate menu instance. Not very efficient, is it?

The problem with this kind of non-owning references where each object is independent from the other, is that if we decide to free the popup menu for some reason, we need to be able to notify the interested edit or other controls that the menu is no longer available. In such a scenario, the control will just set the menu reference to nil, and that will be an indicator that there is no popup menu functionality associated with the control.

TComponent solves this problem by implementing a notification mechanism. When you associate a menu with an edit control, the menu reference will be stored in the edit, but the edit will also be added to the menu component's notification list. That way, the menu can notify all registered components during its destruction process. However, now you have a situation where menu holds a reference to the edit control, too. To prevent accessing a dangling pointer, the edit needs to add the menu to its component notification list. If the edit gets released before the menu, the menu will be able to remove the edit reference from the notification list, and avoid problems with accessing a destroyed instance.

If you use TComponent descendants, you can use its notification system to solve issues with dangling pointers when you need to create an association between two components. If you need to store non-owning references between object instances that don't belong to the TComponent class hierarchy, you would need to implement some similar mechanism to avoid a dangling pointer problem.

Under manual memory management, every reference to an object is the same, as there are no additional designators that will tell us whether a reference is an owning or non-owning one, and how we should deal with it. We can only determine that by inspecting the code logic. If the code logic is flawed, we will have issues with managing memory.

Because we don't need to explicitly mark object references as owning (strong) or non-owning (weak), it may seem like we don't need to think about their categorization under manual memory management, but in reality while we are writing or reading code, we can always tell which reference is strong and which ones are weak. If we cannot tell that, then we have a serious problem with that particular code.

Additionally, if we take a non-owning reference to a object, where that reference can turn into a dangling pointer, we need to use some mechanism to avoid creating dangling pointers. In other words, we need to be able to nil such a reference when the object is no longer valid, and that nil value will be an indicator that the particular functionality is not available.


Post a Comment

Popular posts from this blog

FreeAndNil Debate: Thread Safety

Just released book: Delphi Thread Safety Patterns

Book Review: Delphi Legacy Projects