How Linux Works: Chapter1 Linux Overview (Part2)
Satoru Takeuchi
Posted on March 26, 2023
Libraries
In this section, we will discuss libraries provided by the operating system. Many programming languages offer the ability to bundle commonly used functions across multiple programs into libraries. This allows programmers to efficiently develop programs by choosing from a vast array of libraries created by their predecessors. Some libraries, which are expected to be used by a large number of programs, may be provided by the operating system.
The following figure shows the software hierarchy when a process is using a library.
C language has a standard library defined by the International Organization for Standardization (ISO). Linux also provides this standard C library. Typically, the glibc provided by the GNU project GNU is used as the standard C library. In this book, we will refer to glibc as libc.
Almost all C programs written in C language are linked with libc.
You can use the ldd
command to check which libraries a program is linked with. Let's take a look at the ldd
output for the echo
command.
$ ldd /bin/echo
linux-vdso.so.1 (0x00007ffef73a9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2925ebd000)
/lib64/ld-linux-x86-64.so.2 (0x00007f29260d1000)
$
In the above example, libc.so.6
refers to the standard C library. Also, ld-linux-x86-64.so.2
is a special library for loading shared libraries, which is also one of the libraries provided by the OS.
Let's also check the cat
command.
$ ldd /bin/cat
linux-vdso.so.1 (0x00007ffc3b155000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fabd1194000)
/lib64/ld-linux-x86-64.so.2 (0x00007fabd13a9000)
$
This also links to libc. Let's also look at the python3
command, which is the Python3 interpreter.
$ ldd /usr/bin/python3
linux-vdso.so.1 (0x00007ffc91126000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5fb7206000)
...
/lib64/ld-linux-x86-64.so.2 (0x00007f5fb740f000)
$
Again, libc is linked. In other words, when executing Python programs, the standard C library is used internally. Although few people may use C language directly nowadays, it can be seen that it remains an important language as the backbone of the OS level.
If you run the ldd
command for various programs existing in the system, you will see that many of them are linked with libc. Please give it a try.
In Linux, in addition to this, standard libraries for various programming languages, such as C++, are provided. It also offers libraries that, while not standard, many programmers are likely to use. In Ubuntu, library files often begin with the string "lib". When I ran dpkg-query -W | grep lib
in my environment, over 1000 packages were displayed.
Wrapper Functions for System Calls
libc not only provides the standard C library but also offers something called "wrapper functions" for system calls. System calls cannot be directly called from high-level languages such as C, unlike regular function calls. They must be invoked using architecture-dependent assembly code.
For example, in the x86_64 CPU architecture, the getppid()
system call is issued at the assembly code level as follows:
mov $0x6e,%eax
syscall
In the first line, the system call number "0x6e" for getppid()
is assigned to the eax register. This is determined by the Linux system call calling convention. The second line issues the system call and transitions to kernel mode via the syscall instruction. After this, the kernel code that processes getppid()
is executed. If you don't usually write assembly language, you don't need to understand the detailed meaning of this source here. Just get a feel for the atmosphere that it's obviously different from the source code you normally see.
In the arm64 architecture, which is mainly used in smartphones and tablets, the getppid()
system call is issued at the assembly code level as follows:
mov x8, <system call number>
svc #0
Quite different, isn't it? Without the help of libc, every time you issue a system call, you would have to write architecture-dependent assembly source code and call it from a high-level language.
This would make program creation more time-consuming and not portable to other architectures.
To solve such problems, libc provides a series of functions called "wrapper functions" for system calls, which internally just call the system calls. Wrapper functions exist for each architecture. From user programs written in high-level languages, you only need to call the system call wrapper functions prepared for each language.
Static Libraries and Shared Libraries
Libraries can be classified into two types: static libraries and shared (or dynamic) libraries. Both provide the same functionality, but the way they are incorporated into a program is different.
When creating a program, first, you compile the source code to create a file called an object file. Then, you link the library used by the object file to create the executable file. At link time, static libraries incorporate the functions within the library into the program. In contrast, shared libraries only embed information such as "call this function of this library" in the executable file at link time. Then, at program startup or during execution, the library is loaded into memory, and the program calls the functions within it.
The following figure shows the difference between the two in the case of a pause
program that only calls the pause()
system call and does nothing else.
And here is the source code of pause
.
#include <unistd.h>
int main(void) {
pause();
return 0;
}
Let's verify if my explanation is correct with the following perspectives:
- The size of
pause
program - Link status with shared libraries
As an example, let's consider linking the libc library to the program. First, let's check the case of using the static library "libc.a"1.
$ cc -static -o pause pause.c
$ ls -l pause
-rwxrwxr-x 1 sat sat 871688 Feb 27 10:29 pause ... (1)
$ ldd pause
not a dynamic executable ... (2)
$
The execution results show the following:
- (1) The program size is just under 900KB
- (2) No shared libraries are linked
Since this program already incorporates libc, it will still work if "libc.a" is deleted. However, doing so would be very dangerous because other programs would no longer be able to statically link with libc, so please do not do this.
Next, let's consider the case of using the shared library "libc.so"2.
$ cc -o pause pause.c
$ ls -l pause
-rwxrwxr-x 1 sat sat 16696 Feb 27 10:43 pause
$ ldd pause
linux-vdso.so.1 (0x00007ffc18a75000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f64ad4e9000)
/lib64/ld-linux-x86-64.so.2 (0x00007f64ad6f7000)
$
From these results, we can see the following:
- The size is about 16KB, which is a fraction of the size when libc is statically linked.
- libc ("/lib/x86_64-linux-gnu/libc.so.6") is dynamically linked.
The pause
command with dynamically linked libc will not execute if libc.so
is deleted. In fact, doing so is even more dangerous than deleting libc.a
, as it would render all programs that link to libc.so
inoperable. If this happens, you'll need to use complex methods to recover or reinstall the entire OS. Please do not do this under any circumstances.
The reason for the small size is that libc is not embedded in the program itself but is loaded into memory at runtime. Instead of using separate copies of libc code for each program, all programs using libc share the same instance.
Both static and shared libraries have their pros and cons, so it's hard to say which is better overall. However, shared libraries have been mainly used for the following reasons:
- They keep the overall storage consumption low.
- If there's an issue with the library, replacing the new shared library will resolve the problem for all programs using that library.
It might be interesting to run the ldd
command on the executable files of the programs you use to see which shared libraries are linked.
Column: The Revival of Static Linking
In this article, I mentioned that shared libraries have been preferred, but the situation has changed slightly in recent years. For example, the popular Go language, which has gained popularity in the past few years, statically links most libraries by default. As a result, most Go program does not depend on any shared libraries.
Let's run ldd
on the hello
program, which is written in Go, to verify this.
$ ldd hello
not a dynamic executable
There are various reasons for this, such as:
- The size issue has become relatively smaller thanks to the large capacity of memory and storage in modern computers.
- If a program can run with just a single executable file, it is easier to handle since you can simply copy the file to run in another environment.
- Faster startup as there is no need to link shared libraries at runtime.
- Shared libraries have issues, such as some programs not working due to library version upgrades, because the behavior of different versions of libraries that should originally work the same can be subtly different (so called "DLL Hell").
There are various ways of thinking, and the appropriate method changes over time.
NOTE
This article is based on my book written in Japanese. Please contact me via satoru.takeuchi@gmail.com if you're interested in publishing this book's English version.
Posted on March 26, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 28, 2024
November 29, 2024
November 29, 2024
November 29, 2024