Paula
Posted on October 30, 2023
When I was studying computer engineering at university I was a huge assembly nerd, and fueled my love for low level stuff. Years after, I haven't really used assembly, so I forgot almost all about it and I decided to give it a go so to remember a little bit about it, so I decided to re-do (by memory and with the help of the Internet) one of the activities I did back then: benchmarking assembly against another language doing the very same thing. Back then I did a Hello world thing, but since I know better now, I decided to have a bit more fun and I created a comparison script. The script is simple: it ask you to write something, if that something is "hello" it greets you back.
Let's take a look at such a script in bash, my fav scripting language:
#!/bin/bash
read -p"Enter command: " ANS
if [ "$ANS" == "hola" ]; then
echo "hola, que tal"
else
echo "ERROR"
Easy peasy. Now let's do the same thing in NASM assembly:
asm
section .data
msg1: db 'Hola, que tal',10 ; msg1
lenmsg1: equ $-msg1 ; length msg1
msg2: db 'ERROR',10 ; msg2
lenmsg2: equ $-msg2 ; length msg2
str2: db 'hola' ; str2
lenstr2: equ $-str2 ; length str2
userMsg db 'Please enter order: ' ;Ask the user to enter a number
lenUserMsg equ $-userMsg ;The length of the message
dispMsg db 'The computer says: '
lenDispMsg equ $-dispMsg
section .bss
reply resb 5
section .text
global _start
_start:
mov eax, 4
mov ebx, 1
mov ecx, userMsg
mov edx, lenUserMsg
int 80h
;Read and store the user input
mov eax, 3
mov ebx, 2
mov ecx, reply
mov edx, 5 ;5 bytes (numeric, 1 for sign) of that information
int 80h
;Output the message 'The entered number is: '
mov eax, 4
mov ebx, 1
mov ecx, dispMsg
mov edx, lenDispMsg
int 80h
mov esi,reply
mov edi,str2
mov ecx,lenstr2+1
cld
repe cmpsb
jecxz good
; If bad
mov eax,4
mov ebx,1
mov ecx,msg2
mov edx,lenmsg2
int 80h
jmp exit
good:
mov eax,4
mov ebx,1
mov ecx,msg1
mov edx,lenmsg1
int 80h
exit:
mov eax,1
mov ebx,0
int 80h
Now, let's compile the NASM:
$ nasm -f elf hellothere.asm
$ ld -m elf_i386 -s -o hellothere hellothere.o
Nice! Let's execute both and see how it works:
BASH:
./hellothere.sh
Enter command: hola
hola, que tal
Now, assembly:
./hellothereas
Please enter order: hola
The computer says: hola, que tal
Looks good to me. Now the fun part! We are using strace
to measure the basics of both scripts, wanna learn more? This stack overflow thread is pretty nice. Now, let's get to it:
strace -o trace_bash -c -Ttt ./hellothere.sh
strace: -t/--absolute-timestamps has no effect with -c/--summary-only
strace: -T/--syscall-times has no effect with -c/--summary-only
Enter command: hola
hola, que tal
strace -o trace_assembly -c -Ttt ./hellothereas
strace: -t/--absolute-timestamps has no effect with -c/--summary-only
strace: -T/--syscall-times has no effect with -c/--summary-only
Please enter order: hola
The computer says: Hola, que tal
The benchmarking is inside the trace_assembly
and trace_bash
files, let's take a look:
ASSEMBLY:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000476 476 1 execve
------ ----------- ----------- --------- --------- ----------------
100.00 0.000476 476 1 total
System call usage summary for 32 bit mode:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
51.92 0.000027 27 1 read
48.08 0.000025 8 3 write
------ ----------- ----------- --------- --------- ----------------
100.00 0.000052 13 4 total
BASH:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 6 read
0.00 0.000000 0 2 write
0.00 0.000000 0 7 close
0.00 0.000000 0 3 lseek
0.00 0.000000 0 14 mmap
0.00 0.000000 0 4 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 14 rt_sigaction
0.00 0.000000 0 5 rt_sigprocmask
0.00 0.000000 0 4 2 ioctl
0.00 0.000000 0 4 pread64
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 dup2
0.00 0.000000 0 3 getpid
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 uname
0.00 0.000000 0 3 1 fcntl
0.00 0.000000 0 1 sysinfo
0.00 0.000000 0 1 getuid
0.00 0.000000 0 1 getgid
0.00 0.000000 0 1 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 3 getppid
0.00 0.000000 0 1 getpgrp
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 7 openat
0.00 0.000000 0 18 newfstatat
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 3 prlimit64
0.00 0.000000 0 1 getrandom
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 121 5 total
I know what are you thinking. "You didn't compile the bash!"
Okay then let's try that, shall we:
shc -f hellothere.sh
mv hellothere.sh.x hellothere_bash
And bechmarking!
strace -o trace_bash_ex -c -Ttt ./hellothere_bash
Let's take a look!
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
37.69 0.000652 163 4 execve
15.84 0.000274 6 44 mmap
7.86 0.000136 3 35 newfstatat
5.49 0.000095 5 16 openat
4.34 0.000075 2 31 rt_sigaction
3.87 0.000067 4 14 mprotect
3.12 0.000054 7 7 read
2.83 0.000049 3 16 close
2.49 0.000043 10 4 munmap
1.85 0.000032 2 16 pread64
1.79 0.000031 3 10 rt_sigprocmask
1.73 0.000030 15 2 write
1.62 0.000028 2 12 brk
0.98 0.000017 2 8 getpid
0.92 0.000016 2 6 getppid
0.81 0.000014 2 5 2 ioctl
0.81 0.000014 3 4 4 access
0.75 0.000013 1 8 4 arch_prctl
0.58 0.000010 5 2 sysinfo
0.58 0.000010 1 6 prlimit64
0.52 0.000009 2 4 getrandom
0.46 0.000008 2 4 rseq
0.40 0.000007 3 2 futex
0.40 0.000007 1 4 set_tid_address
0.40 0.000007 1 4 set_robust_list
0.29 0.000005 2 2 2 getpeername
0.29 0.000005 2 2 uname
0.29 0.000005 2 2 getuid
0.29 0.000005 2 2 getpgrp
0.23 0.000004 2 2 getgid
0.23 0.000004 2 2 geteuid
0.23 0.000004 2 2 getegid
------ ----------- ----------- --------- --------- ----------------
100.00 0.001730 6 282 12 total
Wow! You know what? you can actually compile stuff using something called "Optimization" which is pretty much assembly magic, or the non-lazy way, which takes a little bit more compiling processing but the result is optimized. Did you know?
Let me show you using a C version of this command:
#include<stdio.h>
#include<string.h>
char *mygets(char *buf, size_t size) {
if (buf != NULL && size > 0) {
if (fgets(buf, size, stdin)) {
buf[strcspn(buf, "\n")] = '\0';
return buf;
}
*buf = '\0'; /* clear buffer at end of file */
}
return NULL;
}
int string_compare(char str1[], char str2[])
{
int ctr=0;
while(str1[ctr]==str2[ctr])
{
if(str1[ctr]=='\0'||str2[ctr]=='\0')
break;
ctr++;
}
if(str1[ctr]=='\0' && str2[ctr]=='\0')
return 0;
else
return -1;
}
int main()
{
char a[100];
char b[] = "hola";
printf("Enter command\n");
mygets(a, sizeof a);
if( string_compare(a,b) == 0 )
printf("hola que tal\n");
else
printf("ERROR.\n");
return 0;
}
Before going forward, some clarifications:
- I used a "mygets" function so to substitute "gets" function, the reason is over here.
- I used a function for comparing instead of strcmp (inspired by this one) so to be in control and make it as low level as possible.
Let's compile it!
gcc -o hello_in_c hello.c
Now benchmarking:
trace -o trace_forc -c -Ttt ./hello_in_c
And this is the result with default optimization:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 2 read
0.00 0.000000 0 2 write
0.00 0.000000 0 2 close
0.00 0.000000 0 8 mmap
0.00 0.000000 0 3 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 4 pread64
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 2 openat
0.00 0.000000 0 4 newfstatat
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 prlimit64
0.00 0.000000 0 1 getrandom
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 40 2 total
Optimizations -O2
,-O3
and -Ofast
give me the same strace results.
Let's check maybe in python? this is the last language I promise!
import re
def compare_strings(string1, string2):
pattern = re.compile(string2)
match = re.search(pattern, string1)
if match:
print(f"hola, que tal")
else:
print(f"ERROR")
string1 = "hola"
string2 = input("Enter command:")
compare_strings(string1, string2)
First let's strace
the script itself:
$ strace -o trace_python1 -c -Ttt python3 hello.py
$ strings trace_python1
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
31.95 0.001331 4 328 58 newfstatat
22.28 0.000928 42 22 getdents64
10.95 0.000456 4 97 read
10.35 0.000431 6 66 3 openat
5.74 0.000239 3 66 close
5.52 0.000230 2 91 3 lseek
4.30 0.000179 2 63 52 ioctl
3.84 0.000160 2 66 rt_sigaction
3.02 0.000126 10 12 brk
0.96 0.000040 1 28 mmap
0.41 0.000017 4 4 3 readlink
0.17 0.000007 2 3 dup
0.17 0.000007 7 1 sysinfo
0.10 0.000004 2 2 getcwd
0.07 0.000003 3 1 getuid
0.05 0.000002 2 1 fcntl
0.05 0.000002 2 1 getgid
0.05 0.000002 2 1 geteuid
0.05 0.000002 2 1 getegid
0.00 0.000000 0 2 write
0.00 0.000000 0 8 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 4 pread64
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 prlimit64
0.00 0.000000 0 2 getrandom
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 0.004166 4 881 121 total
Now let's compile it! For python, the most popular compiling tool is pyinstaller
$ pyinstaller hello.py
$ cd /dist/hello
$ strace -o trace_python2 -c -Ttt ./hello
$ ls trace_python2
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
29.59 0.000377 1 209 read
22.61 0.000288 2 122 15 openat
12.24 0.000156 0 176 3 lseek
10.83 0.000138 0 162 15 newfstatat
8.79 0.000112 1 110 close
8.16 0.000104 4 21 brk
6.59 0.000084 0 86 76 ioctl
0.71 0.000009 0 69 mmap
0.47 0.000006 1 4 getcwd
0.00 0.000000 0 2 write
0.00 0.000000 0 2 lstat
0.00 0.000000 0 20 mprotect
0.00 0.000000 0 6 munmap
0.00 0.000000 0 66 rt_sigaction
0.00 0.000000 0 8 pread64
0.00 0.000000 0 2 2 access
0.00 0.000000 0 3 dup
0.00 0.000000 0 2 execve
0.00 0.000000 0 1 fcntl
0.00 0.000000 0 3 1 readlink
0.00 0.000000 0 1 sysinfo
0.00 0.000000 0 4 2 arch_prctl
0.00 0.000000 0 1 futex
0.00 0.000000 0 4 getdents64
0.00 0.000000 0 2 set_tid_address
0.00 0.000000 0 2 set_robust_list
0.00 0.000000 0 2 prlimit64
0.00 0.000000 0 3 getrandom
0.00 0.000000 0 2 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 0.001274 1 1095 114 total
Now, what conclusions can we briefly extract from these test?
Well, if we are using some high-level programming we have to trust in the compilation, if we are using low level (like assembly) you are in control but, as a human, you might not be able to make it more optimized as a good compiler, or in general how the low level details for our compiling choices work.
You might be asking yourself, does this really matter? In a world where even the smallest processing units are capable of running heavy stuff? In a situation in which we are at the peak of the Moore's law?
There are several answers to this. As it's amazingly put in this article There's an inherent responsibility in caring about even the smallest things as technologist (a word that I like waaay more than engineer, since tech and low-tech is made by several sort of people). They hurry-up ways we've been procuring for years as a society is leveraging the climate issues, so it does make sense to think about this. Don't get me wrong here, I'm not telling you "stop using random compilers and languages!" (also I use python a lot! as well as other languages), what I'm saying is that it might be interesting to take into consideration what we are doing and what are we using to make our tech possible. Maybe explore different options and step over new ideas in different directions.
There's something about exploring and trying new tech things that makes people excited, and there are certain tech-developers who takes advantage of this wonderful feeling so to create more hurtful tech (for people and for the environment). What I'm saying is take that excitement and use it to explore new ideas, or old ideas from new perspectives so to make everything better! There are so many communities that are in need of these sort of explorations.
So, the easy answer it, no, it doesn't really matter in terms of a single or a bunch of binaries calling stuff in a perfectly functional computer. All of the binaries and scripts above worked perfectly and, as an human, I didn't notice anything different among them in the performance. There were some tiny little differences though! which makes me think "how many other things are happening out there with my daily programs that I'm not noticing?" It's not really about efficiency here. It's about understanding a little bit more about what we are doing and how it works.
Posted on October 30, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.