CALL me, maybe

When I was studying computer engineering at university I was a huge assembly nerd, and fueled my love for low level stuff. Years after, I haven't really used assembly, so I forgot almost all about it and I decided to give it a go so to remember a little bit about it, so I decided to re-do (by memory and with the help of the Internet) one of the activities I did back then: benchmarking assembly against another language doing the very same thing. Back then I did a Hello world thing, but since I know better now, I decided to have a bit more fun and I created a comparison script. The script is simple: it ask you to write something, if that something is "hello" it greets you back.

Let's take a look at such a script in bash, my fav scripting language:

#!/bin/bash
read -p"Enter command: " ANS
if [ "$ANS" == "hola" ]; then
    echo "hola, que tal"
else
    echo "ERROR"

Easy peasy. Now let's do the same thing in NASM assembly:

asm 
section .data
   msg1:      db   'Hola, que tal',10 ; msg1
   lenmsg1:   equ   $-msg1 ; length msg1
   msg2:      db   'ERROR',10 ; msg2
   lenmsg2:   equ   $-msg2 ; length msg2
   str2:      db   'hola' ; str2
   lenstr2:   equ   $-str2 ; length str2
   userMsg db 'Please enter order: ' ;Ask the user to enter a number
   lenUserMsg equ $-userMsg             ;The length of the message
   dispMsg db 'The computer says: '
   lenDispMsg equ $-dispMsg  
section .bss
   reply resb 5
section .text
   global _start
_start:
   mov eax, 4
   mov ebx, 1
   mov ecx, userMsg
   mov edx, lenUserMsg
   int 80h
   ;Read and store the user input
   mov eax, 3
   mov ebx, 2
   mov ecx, reply  
   mov edx, 5          ;5 bytes (numeric, 1 for sign) of that information
   int 80h
   ;Output the message 'The entered number is: '
   mov eax, 4
   mov ebx, 1
   mov ecx, dispMsg
   mov edx, lenDispMsg
   int 80h 
   mov esi,reply
   mov edi,str2
   mov ecx,lenstr2+1
   cld
   repe cmpsb
   jecxz good
   ; If bad
   mov eax,4
   mov ebx,1
   mov ecx,msg2
   mov edx,lenmsg2
   int 80h
   jmp exit
good:
   mov eax,4
   mov ebx,1
   mov ecx,msg1
   mov edx,lenmsg1
   int 80h
exit:
   mov eax,1
   mov ebx,0
   int 80h

Now, let's compile the NASM:

$ nasm -f elf hellothere.asm
$ ld -m elf_i386 -s -o hellothere hellothere.o

Nice! Let's execute both and see how it works:

BASH:

./hellothere.sh
Enter command: hola
hola, que tal

Now, assembly:

./hellothereas
Please enter order: hola
The computer says: hola, que tal

Looks good to me. Now the fun part! We are using strace to measure the basics of both scripts, wanna learn more? This stack overflow thread is pretty nice. Now, let's get to it:

strace -o trace_bash -c -Ttt ./hellothere.sh 
strace: -t/--absolute-timestamps has no effect with -c/--summary-only
strace: -T/--syscall-times has no effect with -c/--summary-only
Enter command: hola
hola, que tal

strace -o trace_assembly -c -Ttt ./hellothereas
strace: -t/--absolute-timestamps has no effect with -c/--summary-only
strace: -T/--syscall-times has no effect with -c/--summary-only
Please enter order: hola
The computer says: Hola, que tal

The benchmarking is inside the trace_assembly and trace_bash files, let's take a look:

ASSEMBLY:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000476         476         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.000476         476         1           total
System call usage summary for 32 bit mode:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 51.92    0.000027          27         1           read
 48.08    0.000025           8         3           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.000052          13         4           total

BASH:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         6           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         7           close
  0.00    0.000000           0         3           lseek
  0.00    0.000000           0        14           mmap
  0.00    0.000000           0         4           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0        14           rt_sigaction
  0.00    0.000000           0         5           rt_sigprocmask
  0.00    0.000000           0         4         2 ioctl
  0.00    0.000000           0         4           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           dup2
  0.00    0.000000           0         3           getpid
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         3         1 fcntl
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           getgid
  0.00    0.000000           0         1           geteuid
  0.00    0.000000           0         1           getegid
  0.00    0.000000           0         3           getppid
  0.00    0.000000           0         1           getpgrp
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         7           openat
  0.00    0.000000           0        18           newfstatat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         3           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000           0       121         5 total

I know what are you thinking. "You didn't compile the bash!"

Okay then let's try that, shall we:

shc -f hellothere.sh
mv hellothere.sh.x hellothere_bash

And bechmarking!

strace -o trace_bash_ex -c -Ttt ./hellothere_bash

Let's take a look!

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 37.69    0.000652         163         4           execve
 15.84    0.000274           6        44           mmap
  7.86    0.000136           3        35           newfstatat
  5.49    0.000095           5        16           openat
  4.34    0.000075           2        31           rt_sigaction
  3.87    0.000067           4        14           mprotect
  3.12    0.000054           7         7           read
  2.83    0.000049           3        16           close
  2.49    0.000043          10         4           munmap
  1.85    0.000032           2        16           pread64
  1.79    0.000031           3        10           rt_sigprocmask
  1.73    0.000030          15         2           write
  1.62    0.000028           2        12           brk
  0.98    0.000017           2         8           getpid
  0.92    0.000016           2         6           getppid
  0.81    0.000014           2         5         2 ioctl
  0.81    0.000014           3         4         4 access
  0.75    0.000013           1         8         4 arch_prctl
  0.58    0.000010           5         2           sysinfo
  0.58    0.000010           1         6           prlimit64
  0.52    0.000009           2         4           getrandom
  0.46    0.000008           2         4           rseq
  0.40    0.000007           3         2           futex
  0.40    0.000007           1         4           set_tid_address
  0.40    0.000007           1         4           set_robust_list
  0.29    0.000005           2         2         2 getpeername
  0.29    0.000005           2         2           uname
  0.29    0.000005           2         2           getuid
  0.29    0.000005           2         2           getpgrp
  0.23    0.000004           2         2           getgid
  0.23    0.000004           2         2           geteuid
  0.23    0.000004           2         2           getegid
------ ----------- ----------- --------- --------- ----------------
100.00    0.001730           6       282        12 total

Wow! You know what? you can actually compile stuff using something called "Optimization" which is pretty much assembly magic, or the non-lazy way, which takes a little bit more compiling processing but the result is optimized. Did you know?

Let me show you using a C version of this command:

#include<stdio.h>
#include<string.h>

char *mygets(char *buf, size_t size) {
    if (buf != NULL && size > 0) {
        if (fgets(buf, size, stdin)) {
            buf[strcspn(buf, "\n")] = '\0';
            return buf;
        }
        *buf = '\0';  /* clear buffer at end of file */
    }
    return NULL;
}

int string_compare(char str1[], char str2[])
{
    int ctr=0;

    while(str1[ctr]==str2[ctr])
    {
        if(str1[ctr]=='\0'||str2[ctr]=='\0')
            break;
        ctr++;
    }
    if(str1[ctr]=='\0' && str2[ctr]=='\0')
        return 0;
    else
        return -1;
}


int main()
{
    char a[100];
    char b[] = "hola";
    printf("Enter command\n");    
    mygets(a, sizeof a);    

    if( string_compare(a,b) == 0 )
        printf("hola que tal\n");
    else
        printf("ERROR.\n");
        return 0;
}

Before going forward, some clarifications:

I used a "mygets" function so to substitute "gets" function, the reason is over here.
I used a function for comparing instead of strcmp (inspired by this one) so to be in control and make it as low level as possible.

Let's compile it!

gcc -o hello_in_c hello.c

Now benchmarking:

trace -o trace_forc -c -Ttt ./hello_in_c

And this is the result with default optimization:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         2           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         2           close
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         4           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         2           openat
  0.00    0.000000           0         4           newfstatat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000           0        40         2 total

Optimizations -O2,-O3and -Ofast give me the same strace results.

Let's check maybe in python? this is the last language I promise!

import re

def compare_strings(string1, string2):
    pattern = re.compile(string2)
    match = re.search(pattern, string1)

    if match:
        print(f"hola, que tal")
    else:
        print(f"ERROR")

string1 = "hola"
string2 = input("Enter command:")


compare_strings(string1, string2)

First let's strace the script itself:

$ strace -o trace_python1 -c -Ttt python3 hello.py
$ strings trace_python1

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 31.95    0.001331           4       328        58 newfstatat
 22.28    0.000928          42        22           getdents64
 10.95    0.000456           4        97           read
 10.35    0.000431           6        66         3 openat
  5.74    0.000239           3        66           close
  5.52    0.000230           2        91         3 lseek
  4.30    0.000179           2        63        52 ioctl
  3.84    0.000160           2        66           rt_sigaction
  3.02    0.000126          10        12           brk
  0.96    0.000040           1        28           mmap
  0.41    0.000017           4         4         3 readlink
  0.17    0.000007           2         3           dup
  0.17    0.000007           7         1           sysinfo
  0.10    0.000004           2         2           getcwd
  0.07    0.000003           3         1           getuid
  0.05    0.000002           2         1           fcntl
  0.05    0.000002           2         1           getgid
  0.05    0.000002           2         1           geteuid
  0.05    0.000002           2         1           getegid
  0.00    0.000000           0         2           write
  0.00    0.000000           0         8           mprotect
  0.00    0.000000           0         2           munmap
  0.00    0.000000           0         4           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         2           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.004166           4       881       121 total

Now let's compile it! For python, the most popular compiling tool is pyinstaller

$ pyinstaller hello.py
$ cd /dist/hello
$ strace -o trace_python2 -c -Ttt ./hello
$ ls trace_python2

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 29.59    0.000377           1       209           read
 22.61    0.000288           2       122        15 openat
 12.24    0.000156           0       176         3 lseek
 10.83    0.000138           0       162        15 newfstatat
  8.79    0.000112           1       110           close
  8.16    0.000104           4        21           brk
  6.59    0.000084           0        86        76 ioctl
  0.71    0.000009           0        69           mmap
  0.47    0.000006           1         4           getcwd
  0.00    0.000000           0         2           write
  0.00    0.000000           0         2           lstat
  0.00    0.000000           0        20           mprotect
  0.00    0.000000           0         6           munmap
  0.00    0.000000           0        66           rt_sigaction
  0.00    0.000000           0         8           pread64
  0.00    0.000000           0         2         2 access
  0.00    0.000000           0         3           dup
  0.00    0.000000           0         2           execve
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0         3         1 readlink
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         4         2 arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         4           getdents64
  0.00    0.000000           0         2           set_tid_address
  0.00    0.000000           0         2           set_robust_list
  0.00    0.000000           0         2           prlimit64
  0.00    0.000000           0         3           getrandom
  0.00    0.000000           0         2           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.001274           1      1095       114 total

Now, what conclusions can we briefly extract from these test?

Well, if we are using some high-level programming we have to trust in the compilation, if we are using low level (like assembly) you are in control but, as a human, you might not be able to make it more optimized as a good compiler, or in general how the low level details for our compiling choices work.

You might be asking yourself, does this really matter? In a world where even the smallest processing units are capable of running heavy stuff? In a situation in which we are at the peak of the Moore's law?

There are several answers to this. As it's amazingly put in this article There's an inherent responsibility in caring about even the smallest things as technologist (a word that I like waaay more than engineer, since tech and low-tech is made by several sort of people). They hurry-up ways we've been procuring for years as a society is leveraging the climate issues, so it does make sense to think about this. Don't get me wrong here, I'm not telling you "stop using random compilers and languages!" (also I use python a lot! as well as other languages), what I'm saying is that it might be interesting to take into consideration what we are doing and what are we using to make our tech possible. Maybe explore different options and step over new ideas in different directions.

There's something about exploring and trying new tech things that makes people excited, and there are certain tech-developers who takes advantage of this wonderful feeling so to create more hurtful tech (for people and for the environment). What I'm saying is take that excitement and use it to explore new ideas, or old ideas from new perspectives so to make everything better! There are so many communities that are in need of these sort of explorations.

So, the easy answer it, no, it doesn't really matter in terms of a single or a bunch of binaries calling stuff in a perfectly functional computer. All of the binaries and scripts above worked perfectly and, as an human, I didn't notice anything different among them in the performance. There were some tiny little differences though! which makes me think "how many other things are happening out there with my daily programs that I'm not noticing?" It's not really about efficiency here. It's about understanding a little bit more about what we are doing and how it works.

Blog

Paula

Join Our Newsletter. No Spam, Only the good stuff.

Related