libdl
0.0.1
Simple yet powerful deep learning
|
Tensors are at the heart of deep learning and one has to strike a good balance between performance, flexibility and ease of use when implementing them. In the context of C++, there are many design decisions going into how tensors could be implemented. For example:
With respect to 1., we ultimately decided on an abstract base class for general pointer implementations, dl::TensorImpl
. And, to reduce the API's complexity, any concrete implementations of the base class are hidden behind device specific factories implementing dl::Device
. Some key points in favor if this implementation are:
dl::TensorImpl
(or rather dl::TensorPtr
) as outlined below. It is also semantically clear that all tensors are equal (no matter on which device they are stored)The correct choice for 2. was a lot more tricky but after initially going with unique ownership, tensors now share their memory. The key question behind this is, what behavior is natural for the user and efficient? For example consider the following code:
which value should a
hold? Unique ownership of memory would dictate that, since b
cannot "co-own" the memory used by a
, it must have created a copy of a
's memory and in the third line only the copy is modified such that a
still holds the value {1, 2, 3}
by the end. We don't always want for b
to copy the members of a
. For example, the return-type of the subscript operator ([]
) can't be dl::Tensor
since it must reference the memory of a
. The best way to solve this, would be an additional datatype, dl::TensorView
:
With shared ownership, however, dl::Tensor b = a;
could mean both: creating a copy of a
(it is the "copy constructor" after all) or simply adding a reference to the memory that a
references as well. For libdl, we chose shared ownership and solved this ambiguity problem by naming the datatyoe dl::TensorPtr
, which could generally be thought of as a std::shared_ptr<dl::TensorImpl>
with some tensor-specific API. It should now be clear that in
b
is a (reference counted) pointer to the same memory that a
also points to. The copy constructor copies the pointer and not the memory. There certainly are good points for both shared and unique ownership. Unique ownership for example (in the context of std::unique_ptr
vs std::shared_ptr
) is more performant since no atomic reference counter is needed and allows/forces the user of the API to think more about how they manage their memory. E.g., the user has control over if they want to copy or move memory. In practice, however, this also means that the exact same function must be overloaded differently depending on memory ownership:
and we did not even mention that we also need to add overloads for dl::TensorView
yet! Further consider the following case:
Option 2 and 3 clearly don't work since tmp
is used after it was moved away. Option 3 does not work since the first parameter causes the computation graph to hold a reference to tmp
, which is moved away right after. Option 1 does work but is not ideal since it needs to unnecessarily copy the tensor. This problem could be avoided by something like
but simplicity should be key here.