vtables under the surface | Episode 2 - ELF files
Pierre Gradot
Posted on January 5, 2024
In this episode, we will explore what vtables mean in terms of bytes within ELF files.
Build Output
On Linux, GCC produces ELF files as the result of the compilation process. In our project, the file is a.out
, and we can use the file
command to get details about it:
$ file a.out
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a87e1cb2356338a14f1a9aa2fef85fb7036bee65, for GNU/Linux 3.2.0, not stripped
If the compiler has generated vtables for Base
and Derived
, there must be corresponding symbols and bytes in the binary.
Inspect the Symbols
Let's identify the symbols related to these classes. We can use objdump
to get the symbol table, employ sort
to order the symbols by addresses, and filter the output to retain only what relates to Base
and Derived
:
$ objdump --syms --demangle a.out | sort | egrep "Base|Derived"
000000000000116a g F .text 0000000000000015 Base::foo() const
0000000000001180 g F .text 0000000000000015 Base::bar() const
0000000000001196 g F .text 0000000000000015 Derived::foo() const
00000000000011ac g F .text 0000000000000015 Derived::bar() const
00000000000011c1 g F .text 000000000000000e use(Base const&)
0000000000002042 w O .rodata 0000000000000006 typeinfo name for Base
0000000000002048 w O .rodata 0000000000000009 typeinfo name for Derived
0000000000003d68 w O .data.rel.ro 0000000000000020 vtable for Base
0000000000003d88 w O .data.rel.ro 0000000000000020 vtable for Derived
0000000000003da8 w O .data.rel.ro 0000000000000010 typeinfo for Base
0000000000003db8 w O .data.rel.ro 0000000000000018 typeinfo for Derived
If you are not familiar with the output format of objdump --syms
, you can refer to the man pages of the command. The help of the --syms
options provide with detailed information about each column.
We see the addresses of our 4 member functions, as well as use()
. These functions are stored in the .text
section, which is the dedicated section for code. We also see several objects (notice the O
in the third column) for the information about the types and... the virtual tables! Let's dig into the contents of these tables.
Alternatives to objdump
Before we move to the hard values, I just want to briefly mention alternatives to objdump
. Unfortunately, neither display the sections along with the symbols (please leave a comment if I have missed the magic options!).
Alternative 1 = nm
$ nm --demangle --print-size a.out | sort | egrep "Base|Derived"
000000000000116a 0000000000000015 T Base::foo() const
0000000000001180 0000000000000015 T Base::bar() const
0000000000001196 0000000000000015 T Derived::foo() const
00000000000011ac 0000000000000015 T Derived::bar() const
00000000000011c1 000000000000000e T use(Base const&)
0000000000002042 0000000000000006 V typeinfo name for Base
0000000000002048 0000000000000009 V typeinfo name for Derived
0000000000003d68 0000000000000020 V vtable for Base
0000000000003d88 0000000000000020 V vtable for Derived
0000000000003da8 0000000000000010 V typeinfo for Base
0000000000003db8 0000000000000018 V typeinfo for Derived
Alternative 2 : readelf
$ readelf --syms --demangle a.out | cut -d: -f2- | sort | egrep "Base|Derived"
000000000000116a 21 FUNC GLOBAL DEFAULT 15 Base::foo() const
0000000000001180 21 FUNC GLOBAL DEFAULT 15 Base::bar() const
0000000000001196 21 FUNC GLOBAL DEFAULT 15 Derived::foo() const
00000000000011ac 21 FUNC GLOBAL DEFAULT 15 Derived::bar() const
00000000000011c1 14 FUNC GLOBAL DEFAULT 15 use(Base const&)
0000000000003d68 32 OBJECT WEAK DEFAULT 22 vtable for Base
0000000000003d88 32 OBJECT WEAK DEFAULT 22 vtable for Derived
0000000000003da8 16 OBJECT WEAK DEFAULT 22 typeinfo for Base
0000000000003db8 24 OBJECT WEAK DEFAULT 22 typeinfo for Derived
To sort by addresses, we must remove the symbol number at the beginning of each line. Indeed, a line is formatted as follows:
22: 0000000000001180 21 FUNC GLOBAL DEFAULT 15 Base::bar() const
Note that typeinfo name for Base
and Derived
are missing here (again, if you know why, leave a comment below!).
The Bytes in Vtables
Vtables and typeinfo symbols reside in the .data.rel.ro
section. We need to examine the raw data of this section to understand their content. readelf
can produce a hexdump of a section:
$ readelf --hex-dump .data.rel.ro a.out
Hex dump of section '.data.rel.ro':
0x00003d68 00000000 00000000 a83d0000 00000000 .........=......
0x00003d78 6a110000 00000000 80110000 00000000 j...............
0x00003d88 00000000 00000000 b83d0000 00000000 .........=......
0x00003d98 96110000 00000000 ac110000 00000000 ................
0x00003da8 00000000 00000000 42200000 00000000 ........B ......
0x00003db8 00000000 00000000 48200000 00000000 ........H ......
0x00003dc8 a83d0000 00000000 .=......
Note that:
- These are little-endian hexadecimal data. Hence,
a83d0000 00000000
on the last line is in fact 0x0000000000003da8. - The same dump can be created with
objdump --section=.data.rel.ro --full-contents a.out
.
Let's inspect these bytes.
How? By navigating through the symbol table to obtain the addresses and sizes of the symbols, and utilizing hexdumps to find out the bytes at these addresses. We will then gain a better understanding of these symbols.
Stay calm and take a deep breath, there will be a lot a hexadecimal numbers.
Vtables
According to the symbol table, both vtables are 32-byte wide (this is 0x20 in hexadecimal) and their addresses are:
- 0x0000000000003d68 for the vtable for
Base
- 0x0000000000003d88 for the vtable for
Derived
Since each line in the hexdump holds 16 bytes, we can deduce that, in the hexdump:
- Lines 1 and 2 show the vtable for
Base
. - Lines 3 and 4 show the vtable for
Derived
.
The ABI's specification (or this presentation on slide 5) explains the content of the vtable. Let's apply this knowledge to analyze the vtables, and match the dumped bytes to addresses in the symbol tables.
For Base
:
Dump | Value | Type | Symbol |
---|---|---|---|
00000000 00000000 | 0x0000000000000000 | ptrdiff_t |
|
a83d0000 00000000 | 0x0000000000003da8 | pointer to struct | typeinfo for Base
|
6a110000 00000000 | 0x000000000000116a | function pointer | Base::foo const |
80110000 00000000 | 0x0000000000001180 | function pointer | Base::bar const |
For Derived
:
Dump | Value | Type | Symbol |
---|---|---|---|
00000000 00000000 | 0x0000000000000000 | ptrdiff_t |
|
b83d0000 00000000 | 0x0000000000003db8 | pointer to struct | typeinfo for Derived
|
96110000 00000000 | 0x0000000000001196 | function pointer | Derived::foo() const |
ac110000 00000000 | 0x00000000000011ac | function pointer | Derived::bar() const |
As you may have guessed, if there were more virtual functions, the vtables would contain more function pointers.
Typeinfo
The vtables point to the typeinfo objects, let's have a closer look at their contents.
From the symbol table, we know that:
Symbol | Address | Size (hexa) | Size (dec) |
---|---|---|---|
typeinfo for Base
|
0x0000000000003da8 | 0x10 | 16 |
typeinfo for Derived
|
0x0000000000003db8 | 0x18 | 24 |
In the hexdump of the section, we can deduce that:
- Line 5 shows the typeinfo for
Base
. - Lines 6 and 7 show the typeinfo for
Derived
.
Once again, we can match the dumped bytes with the data from the symbol table.
For Base
:
Dump | Value | Symbol |
---|---|---|
00000000 00000000 | 0x0000000000000000 | |
42200000 00000000 | 0x0000000000002042 | typeinfo name for Base
|
For Derived
:
Dump | Value | Symbol |
---|---|---|
00000000 00000000 | 0x0000000000000000 | |
48200000 00000000 | 0x0000000000002048 | typeinfo name for Derived
|
a83d0000 00000000 | 0x0000000000003da8 | typeinfo for Base
|
As you have probably guessed, the objects are implementations of std::type_info
. The ABI's specification includes a dedicated section for RTTI. For in-depth details, you may explore (the somewhat esoteric) part about RTTI layout. You will see that std::type_info
has several subtypes, and that __si_class_type_info
is (in all likelihood) used here.
Typeinfo Names
The remaining data we haven't explored yet are the names residing in the .rodata
section. This section is dedicated to store the data from our code.
$ readelf --hex-dump .rodata a.out
Hex dump of section '.rodata':
0x00002000 01000200 42617365 203d3e20 666f6f28 ....Base => foo(
0x00002010 29004261 7365203d 3e206261 72282900 ).Base => bar().
0x00002020 44657269 76656420 3d3e2066 6f6f2829 Derived => foo()
0x00002030 00446572 69766564 203d3e20 62617228 .Derived => bar(
0x00002040 29003442 61736500 37446572 69766564 ).4Base.7Derived
0x00002050 00 .
On the right side of the dump, we find the ASCII interpretation of the hexadecimal values.
Here, we see the strings printed in each function and the mangled names of the classes. Why do the mangled names contain 4 and 7? This simply corresponds to the length of the subsequent identifiers.
Disable RTTI
As a bonus, we will disable RTTI and observe the impact on the vtables.
Modify your CMakeLists.txt
to add the appropriate option:
target_compile_options(a.out PRIVATE
-O1
-fno-rtti
)
Once the project has been recompiled, we can examine the symbol table again:
$ objdump --syms --demangle a.out | sort | egrep "Base|Derived"
000000000000116a g F .text 0000000000000015 Base::foo() const
0000000000001180 g F .text 0000000000000015 Base::bar() const
0000000000001196 g F .text 0000000000000015 Derived::foo() const
00000000000011ac g F .text 0000000000000015 Derived::bar() const
00000000000011c1 g F .text 000000000000000e use(Base const&)
0000000000003da0 w O .data.rel.ro 0000000000000020 vtable for Base
0000000000003dc0 w O .data.rel.ro 0000000000000020 vtable for Derived
The size of the vtables remains unchanged, but all symbols related to type information are now absent.
We can also examine the hexdump of the .data.rel.ro
section:
$ readelf --hex-dump .data.rel.ro a.out
Hex dump of section '.data.rel.ro':
0x00003da0 00000000 00000000 00000000 00000000 ................
0x00003db0 6a110000 00000000 80110000 00000000 j...............
0x00003dc0 00000000 00000000 00000000 00000000 ................
0x00003dd0 96110000 00000000 ac110000 00000000 ................
The second double word in each vtables is now zero, to represent a null pointer, since there is no RTTI to point to.
If you dump the .rodata
section, you will see that the type names are absent as well.
Conclusion
In this episode, we delved into the bytes stored in the ELF file for the vtables. Their content is mostly dictated by the Itanium ABI followed by GCC. Vtables hold a pointer to RTTI (if enabled) along with pointers to all virtual functions. This was a basic case, with a single level of simple inheritance. I may cover more advanced cases in a further episode.
In the next episode, we will explore how vtables are used at assembly level.
I will publish the episodes as I write them. Subscribe to be notified! 🥳
Posted on January 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
January 23, 2024