An alternative Any type?
jmaargh
Posted on October 16, 2023
Rust's Any
type is pretty cool. You can use it to do runtime type reflection, or downcasting, or dynamic typing, or other fun things. However, there are a couple of slightly annoying things about it:
-
TypeId
is currently 128 bits. This is because it's some hash of the concrete type, so needs to be long enough to reasonably avoid hash collisions. - Getting
TypeId
from&dyn Any
requires two dereferences: first you follow the vtable pointer to find the pointer toAny::type_id()
, then you call that function.
In the vast majority of cases this is totally fine (which is why the excellent libs team implemented it this way). You're unlikely to be bottlenecked on either of these. But neither is ideal: u128
operations can be pretty slow on older or embedded chips and nobody likes more indirections than are necessary.
It occurs to me that both can be circumvented, if you're willing to give up one thing: stability of TypeId
values. That is, if you don't need to assume that TypeId
s are the same between different binaries. This seems to be a fairly small thing to give up in most cases. How often are people serialising TypeId
s? Doing so is already a bad idea as they're not guaranteed to be stable between Rust compiler releases.
The idea is to simply store the type ID directly in the vtable and have the compiler guarantee that, in the context of the current build, the ID is unique. No second indirection, no IDs longer than necessary.
Doing this "properly" would require some compiler hacking. But I did come up with a way it can be hacked around: I call it PointerAny
and TypePointer
. The trick is to use a pointer to a method of the PointerAny
trait as the type ID itself.
Let me explain. First, we define the trait
pub trait PointerAny: 'static {
fn type_ptr(&self) -> TypePointer;
}
This is exactly like core::any::Any
, no surprises here.
We also need a TypePointer
instead of TypeId
. This will be the address of a function pointer (as discussed above), so let's do that:
#[derive(PartialEq)]
pub struct TypePointer(usize);
For the sake of simplicity I'll just use a usize
here. Really you'd want NonZeroUsize
or something.
Getting this TypePointer
statically is easy, we just take the address of the function pointer that's stored in the vtable:
impl TypePointer {
fn of<T: PointerAny + ?Sized>() -> Self {
Self(<T as PointerAny>::type_ptr as _)
}
}
But this isn't enough to be useful yet. We need a way of getting TypePointer
from a &dyn PointerAny
. In principle, I feel like there should be a good way of getting the compiler to tell us the address we're looking for. After all, the compiler knows how to call this function, so it therefore knows how to find its address. Unfortunately I don't know how to get the compiler to tell us that address, so instead I'm leaning on some very ugly unsafe code:
impl TypePointer {
fn from(object: &dyn PointerAny) -> Self {
let pointer = unsafe {
let (_data, vtable): (*const (), *const usize) = core::mem::transmute(object);
// vtable consists of:
// - drop pointer
// - size
// - alignment
// - method pointers
// In that order. So this gets us pointing to the first method.
let method_pointer = vtable.add(3);
// We want the pointer for this first method
*method_pointer
};
Self(pointer)
}
}
This requires a little explanation. A wide-pointer like &dyn PointerAny
consists of a pointer to the type's data, followed by a pointer to the vtable. That's what the transmute
call is unpacking here.
Rust, unfortunately for us, doesn't guarantee any particular layout for vtables. However, from what I can gather the current implementation is as outlined in the comment. First there's a function pointer to the drop implementation, then there are usize
s for both the size of the type and its alignment, then there are points to each method. Since we only have one method on PointerAny
, that pointer should be an offset of 3-usize
s from the base pointer. Which is what we take.
Now you may have noticed that we haven't actually implemented PointerAny
yet. That's because we don't ever actually want to call the PointerAny::type_ptr
method: we just want the compiler to give it a unique address per-type. Therefore, its implementation is the least important part of this puzzle (but still essential, as we need the compiler to actually generate it and its address). So we can just implement it in the obvious way:
impl<T: 'static + ?Sized> PointerAny for T {
/// Be careful! If you have a `&dyn PointerAny`, then prefer calling
/// `TypePointer::from` over this to avoid the extra indirection.
fn type_ptr(&self) -> TypePointer {
TypePointer::of::<T>()
}
}
Note, if you call this function from a &dyn PointerAny
then you lose the benefit of avoiding the indirection: prefer calling TypePointer::from
or TypePointer::of
directly.
It's also interesting that PointerAny::type_ptr
is far nicer than TypeId::from
, despite doing the same thing, because at this point we already know the concrete type so can just get the function pointer directly.
And that's it! We can now dynamically type-check just as with core::any::Any
!
pub fn is_same_type(first: &dyn PointerAny, second: &dyn PointerAny) -> bool {
TypePointer::from(first) == TypePointer::from(second)
}
pub fn is_type<T: PointerAny>(object: &dyn PointerAny) -> bool {
TypePointer::from(object) == TypePointer::of::<T>()
}
So we've successfully addressed the two "shortcomings" discussed above:
- Our new
TypePointer
is only ausize
, which is ideal for almost every architecture. - We only do one pointer dereference in
TypePointer::from
. - We've also gained
TypePointer
being non-zero, which allows niche optimisations forOption
etc. (if we'd usedNonNullUsize
)
On top of that we still have:
-
TypePointer::of
is still a compile-time constant (no indirection) - In principle this could all be done in a compile-time
const fn
-compatible way (though you'd want to be really careful about theconst fn
use of pointers - perhaps this isn't possible yet).
So what are the tradeoffs? What have we lost?
- Stability of
TypePointer
values: if you recompile your program, even with the same compiler, these may change. Don't ever serialize theseTypePointer
s: they're just pointers after all. - Stability of implementation. I had to write some very ugly
unsafe
code to get this to work, because I couldn't fine a stable way to get the compiler to tell me the address of a vtable method from a wide pointer. In principle this needn't be so ugly, but I just could not find a way of doing it without assuming the structure of the vtable. - Correctness? The current implementation assumes that the compiler will generate exactly one version of
PointerAny::type_ptr
for any given type (when needed). That is, there is a one-to-one correspondence between addresses ofPointerAny::type_ptr
and types themselves. I'm not 100% sure this is a guarantee, but I've assumed it's true. It's known that Rust can generate multiple vtables for the same types - otherwise we could just use the vtable address itself and have zero indirections - but I've assumed that the pointers contained are stable.
It's also interesting that we could have implemented TypePoitner
over core::any::Any
rather than defining a new Any
type. The only assumptions we need are that (a) the trait is implemented for every 'static
type, (b) there are unique addresses for at least one method per type, and (c) we know how to find that address from a wide pointer.
I'd love to hear what people think of this. There are probably some things here that are wrong (well, even more wrong than the TypePointer::from
implementation), so let me know!
Discuss on reddit
Posted on October 16, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.