vtables under the surface | Episode 2 - ELF files

pgradot

Pierre Gradot

Posted on January 5, 2024

vtables under the surface | Episode 2 - ELF files

In this episode, we will explore what vtables mean in terms of bytes within ELF files.

Build Output

On Linux, GCC produces ELF files as the result of the compilation process. In our project, the file is a.out, and we can use the file command to get details about it:

$ file a.out 
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a87e1cb2356338a14f1a9aa2fef85fb7036bee65, for GNU/Linux 3.2.0, not stripped

Enter fullscreen mode Exit fullscreen mode

If the compiler has generated vtables for Base and Derived, there must be corresponding symbols and bytes in the binary.

Inspect the Symbols

Let's identify the symbols related to these classes. We can use objdump to get the symbol table, employ sort to order the symbols by addresses, and filter the output to retain only what relates to Base and Derived:

$ objdump --syms --demangle a.out | sort | egrep "Base|Derived"
000000000000116a g     F .text  0000000000000015              Base::foo() const
0000000000001180 g     F .text  0000000000000015              Base::bar() const
0000000000001196 g     F .text  0000000000000015              Derived::foo() const
00000000000011ac g     F .text  0000000000000015              Derived::bar() const
00000000000011c1 g     F .text  000000000000000e              use(Base const&)
0000000000002042  w    O .rodata        0000000000000006              typeinfo name for Base
0000000000002048  w    O .rodata        0000000000000009              typeinfo name for Derived
0000000000003d68  w    O .data.rel.ro   0000000000000020              vtable for Base
0000000000003d88  w    O .data.rel.ro   0000000000000020              vtable for Derived
0000000000003da8  w    O .data.rel.ro   0000000000000010              typeinfo for Base
0000000000003db8  w    O .data.rel.ro   0000000000000018              typeinfo for Derived
Enter fullscreen mode Exit fullscreen mode

If you are not familiar with the output format of objdump --syms, you can refer to the man pages of the command. The help of the --syms options provide with detailed information about each column.

We see the addresses of our 4 member functions, as well as use(). These functions are stored in the .text section, which is the dedicated section for code. We also see several objects (notice the O in the third column) for the information about the types and... the virtual tables! Let's dig into the contents of these tables.

Alternatives to objdump

Before we move to the hard values, I just want to briefly mention alternatives to objdump. Unfortunately, neither display the sections along with the symbols (please leave a comment if I have missed the magic options!).

Alternative 1 = nm

$ nm --demangle --print-size a.out | sort | egrep "Base|Derived"
000000000000116a 0000000000000015 T Base::foo() const
0000000000001180 0000000000000015 T Base::bar() const
0000000000001196 0000000000000015 T Derived::foo() const
00000000000011ac 0000000000000015 T Derived::bar() const
00000000000011c1 000000000000000e T use(Base const&)
0000000000002042 0000000000000006 V typeinfo name for Base
0000000000002048 0000000000000009 V typeinfo name for Derived
0000000000003d68 0000000000000020 V vtable for Base
0000000000003d88 0000000000000020 V vtable for Derived
0000000000003da8 0000000000000010 V typeinfo for Base
0000000000003db8 0000000000000018 V typeinfo for Derived
Enter fullscreen mode Exit fullscreen mode

Alternative 2 : readelf

$ readelf --syms --demangle a.out | cut -d: -f2- | sort | egrep "Base|Derived"
 000000000000116a    21 FUNC    GLOBAL DEFAULT   15 Base::foo() const
 0000000000001180    21 FUNC    GLOBAL DEFAULT   15 Base::bar() const
 0000000000001196    21 FUNC    GLOBAL DEFAULT   15 Derived::foo() const
 00000000000011ac    21 FUNC    GLOBAL DEFAULT   15 Derived::bar() const
 00000000000011c1    14 FUNC    GLOBAL DEFAULT   15 use(Base const&)
 0000000000003d68    32 OBJECT  WEAK   DEFAULT   22 vtable for Base
 0000000000003d88    32 OBJECT  WEAK   DEFAULT   22 vtable for Derived
 0000000000003da8    16 OBJECT  WEAK   DEFAULT   22 typeinfo for Base
 0000000000003db8    24 OBJECT  WEAK   DEFAULT   22 typeinfo for Derived
Enter fullscreen mode Exit fullscreen mode

To sort by addresses, we must remove the symbol number at the beginning of each line. Indeed, a line is formatted as follows:

22: 0000000000001180    21 FUNC    GLOBAL DEFAULT   15 Base::bar() const
Enter fullscreen mode Exit fullscreen mode

Note that typeinfo name for Base and Derived are missing here (again, if you know why, leave a comment below!).

The Bytes in Vtables

Vtables and typeinfo symbols reside in the .data.rel.ro section. We need to examine the raw data of this section to understand their content. readelf can produce a hexdump of a section:

$ readelf --hex-dump .data.rel.ro a.out

Hex dump of section '.data.rel.ro':
  0x00003d68 00000000 00000000 a83d0000 00000000 .........=......
  0x00003d78 6a110000 00000000 80110000 00000000 j...............
  0x00003d88 00000000 00000000 b83d0000 00000000 .........=......
  0x00003d98 96110000 00000000 ac110000 00000000 ................
  0x00003da8 00000000 00000000 42200000 00000000 ........B ......
  0x00003db8 00000000 00000000 48200000 00000000 ........H ......
  0x00003dc8 a83d0000 00000000                   .=......
Enter fullscreen mode Exit fullscreen mode

Note that:

  • These are little-endian hexadecimal data. Hence, a83d0000 00000000 on the last line is in fact 0x0000000000003da8.
  • The same dump can be created with objdump --section=.data.rel.ro --full-contents a.out.

Let's inspect these bytes.

How? By navigating through the symbol table to obtain the addresses and sizes of the symbols, and utilizing hexdumps to find out the bytes at these addresses. We will then gain a better understanding of these symbols.

Stay calm and take a deep breath, there will be a lot a hexadecimal numbers.

Vtables

According to the symbol table, both vtables are 32-byte wide (this is 0x20 in hexadecimal) and their addresses are:

  • 0x0000000000003d68 for the vtable for Base
  • 0x0000000000003d88 for the vtable for Derived

Since each line in the hexdump holds 16 bytes, we can deduce that, in the hexdump:

  • Lines 1 and 2 show the vtable for Base.
  • Lines 3 and 4 show the vtable for Derived.

The ABI's specification (or this presentation on slide 5) explains the content of the vtable. Let's apply this knowledge to analyze the vtables, and match the dumped bytes to addresses in the symbol tables.

For Base:

Dump Value Type Symbol
00000000 00000000 0x0000000000000000 ptrdiff_t
a83d0000 00000000 0x0000000000003da8 pointer to struct typeinfo for Base
6a110000 00000000 0x000000000000116a function pointer Base::foo const
80110000 00000000 0x0000000000001180 function pointer Base::bar const

For Derived:

Dump Value Type Symbol
00000000 00000000 0x0000000000000000 ptrdiff_t
b83d0000 00000000 0x0000000000003db8 pointer to struct typeinfo for Derived
96110000 00000000 0x0000000000001196 function pointer Derived::foo() const
ac110000 00000000 0x00000000000011ac function pointer Derived::bar() const

As you may have guessed, if there were more virtual functions, the vtables would contain more function pointers.

Typeinfo

The vtables point to the typeinfo objects, let's have a closer look at their contents.

From the symbol table, we know that:

Symbol Address Size (hexa) Size (dec)
typeinfo for Base 0x0000000000003da8 0x10 16
typeinfo for Derived 0x0000000000003db8 0x18 24

In the hexdump of the section, we can deduce that:

  • Line 5 shows the typeinfo for Base.
  • Lines 6 and 7 show the typeinfo for Derived.

Once again, we can match the dumped bytes with the data from the symbol table.

For Base:

Dump Value Symbol
00000000 00000000 0x0000000000000000
42200000 00000000 0x0000000000002042 typeinfo name for Base

For Derived:

Dump Value Symbol
00000000 00000000 0x0000000000000000
48200000 00000000 0x0000000000002048 typeinfo name for Derived
a83d0000 00000000 0x0000000000003da8 typeinfo for Base

As you have probably guessed, the objects are implementations of std::type_info. The ABI's specification includes a dedicated section for RTTI. For in-depth details, you may explore (the somewhat esoteric) part about RTTI layout. You will see that std::type_info has several subtypes, and that __si_class_type_info is (in all likelihood) used here.

Typeinfo Names

The remaining data we haven't explored yet are the names residing in the .rodata section. This section is dedicated to store the data from our code.

$ readelf --hex-dump .rodata a.out

Hex dump of section '.rodata':
  0x00002000 01000200 42617365 203d3e20 666f6f28 ....Base => foo(
  0x00002010 29004261 7365203d 3e206261 72282900 ).Base => bar().
  0x00002020 44657269 76656420 3d3e2066 6f6f2829 Derived => foo()
  0x00002030 00446572 69766564 203d3e20 62617228 .Derived => bar(
  0x00002040 29003442 61736500 37446572 69766564 ).4Base.7Derived
  0x00002050 00                                  .
Enter fullscreen mode Exit fullscreen mode

On the right side of the dump, we find the ASCII interpretation of the hexadecimal values.

Here, we see the strings printed in each function and the mangled names of the classes. Why do the mangled names contain 4 and 7? This simply corresponds to the length of the subsequent identifiers.

Disable RTTI

As a bonus, we will disable RTTI and observe the impact on the vtables.

Modify your CMakeLists.txt to add the appropriate option:

target_compile_options(a.out PRIVATE
        -O1
        -fno-rtti
)
Enter fullscreen mode Exit fullscreen mode

Once the project has been recompiled, we can examine the symbol table again:

$ objdump --syms --demangle a.out | sort | egrep "Base|Derived"
000000000000116a g     F .text  0000000000000015              Base::foo() const
0000000000001180 g     F .text  0000000000000015              Base::bar() const
0000000000001196 g     F .text  0000000000000015              Derived::foo() const
00000000000011ac g     F .text  0000000000000015              Derived::bar() const
00000000000011c1 g     F .text  000000000000000e              use(Base const&)
0000000000003da0  w    O .data.rel.ro   0000000000000020              vtable for Base
0000000000003dc0  w    O .data.rel.ro   0000000000000020              vtable for Derived
Enter fullscreen mode Exit fullscreen mode

The size of the vtables remains unchanged, but all symbols related to type information are now absent.

We can also examine the hexdump of the .data.rel.ro section:

$ readelf --hex-dump .data.rel.ro a.out

Hex dump of section '.data.rel.ro':
  0x00003da0 00000000 00000000 00000000 00000000 ................
  0x00003db0 6a110000 00000000 80110000 00000000 j...............
  0x00003dc0 00000000 00000000 00000000 00000000 ................
  0x00003dd0 96110000 00000000 ac110000 00000000 ................
Enter fullscreen mode Exit fullscreen mode

The second double word in each vtables is now zero, to represent a null pointer, since there is no RTTI to point to.

If you dump the .rodata section, you will see that the type names are absent as well.

Conclusion

In this episode, we delved into the bytes stored in the ELF file for the vtables. Their content is mostly dictated by the Itanium ABI followed by GCC. Vtables hold a pointer to RTTI (if enabled) along with pointers to all virtual functions. This was a basic case, with a single level of simple inheritance. I may cover more advanced cases in a further episode.

In the next episode, we will explore how vtables are used at assembly level.

I will publish the episodes as I write them. Subscribe to be notified! 🥳

💖 💪 🙅 🚩
pgradot
Pierre Gradot

Posted on January 5, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related