vtable under the surface | Episode 3 - How virtual functions are actually called
Pierre Gradot
Posted on January 23, 2024
In this episode, we will see how invoking a virtual function in C++ translates into assembly instructions. We will see how our class instance is constructed and how it relates to the vtable. Then, we will see how this vtable is used to call the appropriate function.
In you have actually built the project and analyzed the binary in the previous episode, don't forget to remove the
-fno-rtti
option and to rebuild the project. I will use this binary as a reference here.
Execute the Program with gdb
To understand how vtables are used in assembly, disassembling the binary with objdump --dissassemble
could have been an option, but instead, I decided to use gdb
to actually execute the program:
$ gdb a.out
By default, gdb
uses AT&T syntax to disassemble the code, but I prefer Intel flavor:
(gdb) set disassembly-flavor intel
(gdb) show disassembly-flavor
The disassembly flavor is "intel".
Enabling name demangling will make it easier to understand the symbols being manipulated:
(gdb) set print asm-demangle
(gdb) show print asm-demangle
Demangling of C++/ObjC names in disassembly listings is on.
Run to the main:
(gdb) break main
Breakpoint 1 at 0x1139: file /home/pierre/CLionProjects/untitled/main.cpp, line 3.
(gdb) run
Starting program: /home/pierre/CLionProjects/untitled/cmake-build-debug/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at /home/pierre/CLionProjects/untitled/main.cpp:3
3 int main() {
We can now disassemble the code:
(gdb) disas
Dump of assembler code for function main():
=> 0x0000555555555139 <+0>: sub rsp,0x18
0x000055555555513d <+4>: mov DWORD PTR [rsp+0x8],0x0
0x0000555555555145 <+12>: mov DWORD PTR [rsp+0xc],0x0
0x000055555555514d <+20>: lea rax,[rip+0x2c44] # 0x555555557d98 <vtable for Derived+16>
0x0000555555555154 <+27>: mov QWORD PTR [rsp],rax
0x0000555555555158 <+31>: mov rdi,rsp
0x000055555555515b <+34>: call 0x5555555551c1 <use(Base const&)>
0x0000555555555160 <+39>: mov eax,0x0
0x0000555555555165 <+44>: add rsp,0x18
0x0000555555555169 <+48>: ret
End of assembler dump.
We can clearly see the call to use()
. The immediately preceding line prepares the first and only argument of this function (it puts it in the rdi
register). The other lines above are the initialisation of obj
, our instance of Derived
. We can't understand how use()
is called and uses the vtable to call Derived::foo()
if we don't understand how obj
is initialized and how it is connected to the vtable.
Clarification About Addresses
Before we dive in a line-by-line assembly analysis, I want to clarify why the address of use()
in the call
instruction is 0x005555555551c1 and not 0x00000000000011c1 (as in the symbol table from the previous episode, or in the disassembly produced by obdjump --disassemble
).
When the binary is loaded in memory and executed, it's not placed at address 0x0000000000000000 but somewhere else. The addresses in the ELF file are relative from this "somewhere else".
We can compute this offset with a simple subtraction: 0x005555555551c1 - 0x00000000000011c1 = 0x00555555554000. The same translation can be applied to any other symbols. We can verify this offset with the address of main
. We need the address of main
in the binary:
$ objdump --syms a.out | grep 'main' | grep '.text'
0000000000001139 g F .text 0000000000000031 main
If we add the offset to this address, the result is the first address in the disassembly above: 0x0000000000001139 + 0x00555555554000 = 0x0000555555555139.
Construction of the Object
In main()
, obj
is initialized by these instructions (note that =>
indicates the line where gdb
is paused):
=> 0x0000555555555139 <+0>: sub rsp,0x18
0x000055555555513d <+4>: mov DWORD PTR [rsp+0x8],0x0
0x0000555555555145 <+12>: mov DWORD PTR [rsp+0xc],0x0
0x000055555555514d <+20>: lea rax,[rip+0x2c44] # 0x555555557d98 <vtable for Derived+16>
0x0000555555555154 <+27>: mov QWORD PTR [rsp],rax
The first line moves the stack pointer, allocating space to store obj
. We can execute a few commands to verify this:
(gdb) info reg rsp
rsp 0x7fffffffddd8 0x7fffffffddd8
(gdb) nexti
4 auto obj = Derived();
(gdb) info reg rsp
rsp 0x7fffffffddc0 0x7fffffffddc0
(gdb) print &obj
$2 = (Derived *) 0x7fffffffddc0
After these commands, gdb
is paused on the second line, at the address 0x000055555555513d. The address of obj
is the same as the value of the rsp
register. At this moment, the object is created but not initialized:
(gdb) print obj
$3 = {<Base> = {_vptr.Base = 0x0, dummy_base = 1431671456}, dummy_derived = 21845}
It's not surprising to see dummy_base
and dummy_derived
, as they are the member data of Derived
. But what is _vptr.Base
? This is the "virtual pointer". It's automatically inserted by the compiler to reference the vtable. It doesn't point to (the beginning of) the vtable actually, but to a location that the ABI's specification refers to as the "virtual table address point", inside the vtable. This point corresponds to the first function pointer in the vtable. We will get back to this pointer later.
The next instructions are here to initialize obj
.
First, dummy_base
and dummy_derived
are set to 0 by the two mov
instructions (yes, even though our code doesn't require to initialize them). A C++ equivalent of mov DWORD PTR [rsp+0x8],0x0
would ressemble something like *(rsp + 0x8) = 0x0
. At this point, the value of rsp
is 0x007fffffffddc0, confirming that rsp+0x8
and rsp+0xc
are indeed the addresses of the data members:
(gdb) print &obj.dummy_base
$5 = (int *) 0x7fffffffddc8
(gdb) print &obj.dummy_derived
$6 = (int *) 0x7fffffffddcc
The initialization of the virtual pointer is slightly more complex:
0x000055555555514d <+20>: lea rax,[rip+0x2c44] # 0x555555557d98 <vtable for Derived+16>
0x0000555555555154 <+27>: mov QWORD PTR [rsp],rax
The lea
instruction with an operand that is relative to the rip
register is classic compiler technique to get the address of something that is inside the binary being executed. gdb
kindly shows the result of the computation. It even tells us that this is the address of vtable for Derived+16
. In the previous episode, we noted that the first function address in the vtable comes after 8 bytes for the offset and 8 bytes for the pointer to the RTTI, hence a total of 16 bytes. You should understand now why the virtual pointer is set to vtable for Derived+16
: this is the "virtual table address point".
This address is temporarily stored into rax
before being moved on the stack to complete the initialization of obj
.
We can execute these instructions with gdb
to run the initialization:
(gdb) ni
(gdb) ni
(gdb) ni
(gdb) ni
(gdb) print obj
$7 = {<Base> = {_vptr.Base = 0x555555557d98 <vtable for Derived+16>, dummy_base = 0}, dummy_derived = 0}
If we want to verify what is being pointed to by the virtual pointer, we have 2 options.
The first solution is the manual, tedious one. We can dump the memory located at this address and compare the dumped values to the addresses of the member functions of Derived
. We need to dump 2 sets of 8 bytes each (since a function pointer is 8-byte wide):
(gdb) x/2gx 0x555555557d98
0x555555557d98 <vtable for Derived+16>: 0x0000555555555196 0x00005555555551ac
(gdb) print 'Derived::foo'
$8 = {void (const Derived * const)} 0x555555555196 <Derived::foo() const>
(gdb) print 'Derived::bar'
$9 = {void (const Derived * const)} 0x5555555551ac <Derived::bar() const>
The second solution is just using the dedicated command provided by gdb
:
(gdb) info vtbl obj
vtable for 'Derived' @ 0x555555557d98 (subobject @ 0x7fffffffddc0):
[0]: 0x555555555196 <Derived::foo() const>
[1]: 0x5555555551ac <Derived::bar() const>
Calling use()
from main()
The object is now ready, and gdb
is paused here:
(gdb) disas
(...)
=> 0x0000555555555158 <+31>: mov rdi,rsp
0x000055555555515b <+34>: call 0x5555555551c1 <use(Base const&)>
(...)
End of assembler dump.
On Linux-x64, the calling convention for functions uses the rdi
register as the first parameter. We've seen in the previous section that rsp
holds the address of obj
at this point. This address is moved in the rdi
register and the function is ready to be called. Indeed, a reference in C++ is often just an address in assembly.
How the Virtual Function is Actually Called
We can reach the beginning of use()
with ni
(to execute the mov
) and si
(to step into the call
). We can disassemble the function:
(gdb) disas
Dump of assembler code for function _Z3useRK4Base:
=> 0x00005555555551c1 <+0>: sub rsp,0x8
0x00005555555551c5 <+4>: mov rax,QWORD PTR [rdi]
0x00005555555551c8 <+7>: call QWORD PTR [rax]
0x00005555555551ca <+9>: add rsp,0x8
0x00005555555551ce <+13>: ret
End of assembler dump.
The virtual call is right here!
We have seen that the value of rdi
is the virtual pointer and that the first entry in the array pointed to by this virtual pointer is the address of Derived::foo()
. The value pointed to by rdi
is moved into rax
, and then the value pointed to by rax
is called. The function Derived::foo()
executes and prints Derived => foo()
!
You can try to change the code to call bar()
instead of foo()
. The call
instruction will be different. You will have call QWORD PTR [rax+0x8]
instead, to get the next pointer in the array of function pointers.
Indirect Call
In use()
, the instruction call QWORD PTR [rax]
is an indirect call because the value of a register is used. The instruction call 0x5555555551c1
to call use()
from main()
is a direct call. The mnemonic is the same (call
) but the opcodes (the binary values used to translate the mnemonic) are different.
We can examine the memory with gdb
to see the difference:
(gdb) x /1bx use+7
0x5555555551c8 <use(Base const&)+7>: 0xff
(gdb) x /1bx main+34
0x55555555515b <main()+34>: 0xe8
Indirect calls are typically used to implement function pointer.
Conclusion
In this episode, we discovered that the compiler adds a hidden field for each instance of a class with virtual functions. This field is the "virtual pointer" (or "vptr" in short). Thanks to this pointer, we can get a function pointer inside the vtable, and the desired function.
I will publish the episodes as I write them. Subscribe to be notified! 💃🏽
Posted on January 23, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
January 23, 2024