AWK an old-school tool today

sergiomarcial

Sergio Marcial

Posted on October 17, 2021

AWK an old-school tool today

What is AWK?

AWK is a command-line programming language primarily oriented to text and files processing - some might call it a tool -, simple yet elegant continuous lines of code can replace multiple lines of a more robust language like java or node without losing their intention.

In essence, AWK code is so simple that you can just throw it away after the execution or once your program has finished its work.

% awk 'BEGIN { print "Hello World" }'
Hello World
Enter fullscreen mode Exit fullscreen mode

But there is so much more than that; considering the constant need to process data files, once you have started with AWK, you will stop building complete programs to process CSV or log files for faster and more straightforward with a couple of instructions

% awk '{ print $0 }' example.txt
This is an AWK example

% awk '{ print $4, $1, $5, $3, $2 }' example.txt
AWK This example an is

% awk '{ print $1, "could be your", $4, $5 }' example.txt
This could be your AWK example
Enter fullscreen mode Exit fullscreen mode

Calculations become somewhat ridiculously simple to process

% awk '{ print $0 }' example_numbers.txt
1 2 3 testing

% awk '{ print $1 + $2 + $3, $4 }' example_numbers.txt
6 testing

% awk '{ print $2 * $3, $4 }' example_numbers.txt
6 testing

% awk '{ print $2 / $3, $4 }' example_numbers.txt
0.666667 testing
Enter fullscreen mode Exit fullscreen mode

But the real potential of AWK is still beyond simple operations. With the help of control statements, loops, switch functions, this command-line tool is closer to a programming language hand to hand with multiple file processing operations to make our lives even simpler

For loop example:

% cat loop.awk
#!/bin/awk -f

BEGIN {
    for (i = 1; i <= 3; i++)
        print i
} 

% awk -f loop.awk
1
2
3
Enter fullscreen mode Exit fullscreen mode

Why is relevant today?

In a generation of powerful and versatile programming languages, sometimes we engineers tend to overcomplicate problems, most commonly because of lack of knowledge in other options, so think about how many times have you develop a small Python, NodeJS, or Golang script to read a huge CSV file, or even build a small JVM-oriented language utility with the language of your choice and without even realizing already develop multiple lines of boilerplate (useless) code.

Python script to read a file line by line and print result

import sys

def main():
   filepath = sys.argv[1]

    with open(filepath) as f:
        for index, line in enumerate(f):
            print("Line {}: {}".format(index, line.strip()))


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

The same but with AWK

awk '{ print "Line ", $1, ":", $2 }' example.txt
Enter fullscreen mode Exit fullscreen mode

And you could create more examples to explain the difference between creating scripts with AWK and with any other language, but also it is pretty performant in comparison with other

AWK and its variations' performance measurements
AWK and its variations' performance measurements 1

As you can see, this old-school language (AWK was created initially in 1977) could outshine some of these more robust and modern languages in some tasks, and learning it might give you a new tool you didn't even know you want to have.

First steps in AWK

Let's start by mentioning that AWK is in every Linux and macOS distribution (how cool is that?); for Windows, you have to install it (but I am pretty sure it cannot be that hard, right?).

How to know what version of AWK you currently have installed?

% awk -version
awk version 20200816
Enter fullscreen mode Exit fullscreen mode

And now let's start with the basics; AWK commands' structure is pretty simple; however, there are some tricks to it, especially if you want to use it for actual text processing, the basic command could be described in this way <condition> { action } where condition is optional as we saw in a previous example awk '{ print $0 }' example.txt while the action is the operation you need to execute.

For the conditions, there are only two types of conditions, BEGIN and END, and they also can have actions, for example, consider BEGIN as the entry instruction where you can enable, disable or configure different variables within the script run execution, for example, if you want to change the delimiter character from the default space (' ') to a semicolon (;) you can add something like at the beginning of the script BEGIN { FS= ';'}.

AWK provides 8 built-in variables:

  • FILENAME - Name of the current input file
  • FS - Input field separator variable
  • FNR - Number of Records relative to the current input file
  • NF - Number of Fields in a record
  • NR - Number of Records Variable
  • OFS - Output Field Separator Variable
  • ORS - Output Record Separator Variable
  • RS - Record Separator variable

END, on the other hand, will always be at the closing statement and can be used to execute any finishing commands after the main body has been completed, for example, printing final variables' values:

BEGIN { 
    for (i = 1; i <= 3; i++)
        s += $i 
}
END { print s }
Enter fullscreen mode Exit fullscreen mode

Something else worth mentioning is the fact that AWK supports the creation of custom functions when you need to do more complex operations and the script starts to become hard to manage 2

awk '{ print "The square root of", $1, "is", sqrt($1) }'
Enter fullscreen mode Exit fullscreen mode

AWK also provides the functionality to create Arrays (and operations built-in to manage them) and multiple other data types that we won't be discussing in this post because it might take a couple of hundreds of lines. Still, you can find a good description of them here, so please take a look if you are curious to learn more.

Example of array operations in AWK:

Array addition

BEGIN { 
    for (i = 1; i <= 3; i++)
        array[$i]; 
}
END { 
    for (position in array) 
        print position ": " array[position]
 }
Enter fullscreen mode Exit fullscreen mode

Array deleting


BEGIN { 
    for (i = 1; i <= 3; i++)
        array[$i]; 
}
END { 
    for (position in array) 
        delete array[position]
 }
Enter fullscreen mode Exit fullscreen mode

And in case you are thinking how powerful this is and like me trying to take it further to create small AWK powered "apps" to do the monotonous tasks while wondering how can you verify if what you are coding is valid, you can execute any number of unit tests for shell scripts, and therefore, AWK scripts using shunit2

Data processing with AWK

As mentioned a couple of times during this post, AWK's main objective is to process data, which could mean data in files, lines provided command output, or any other form of input data, but let's start simple.

Opening a file and reading the data

% cat example.txt
> This is an AWK example

% awk '{ print $0 }' example.txt
This is an AWK example
Enter fullscreen mode Exit fullscreen mode

From the previous example AWK, we can notice some things like how AWK uses indexes to split the data provided within the file; these indexes are created using the delimiter, which by default is the blank space (check the example in this post on how to define a new delimiter)

Using $0 will print the whole line, while using the sequence generated based on the number of columns will give you control of the data.

% cat example.txt
> This is an AWK example

% awk '{ print $4, $1, $5, $3, $2 }' example.txt
AWK This example an is
Enter fullscreen mode Exit fullscreen mode

You can also straightforwardly concatenate strings:

% cat example.txt
> This is an AWK example

% awk '{ print $1, "could be your", $4, $5 }' example.txt
This could be your AWK example
Enter fullscreen mode Exit fullscreen mode

Searching a value

AWK can search information within the provided input, and one way is using regexp.

% cat example.txt
> This is an AWK example

% awk '/This/ { print $0 }'
This is an AWK example

% awk '/test/ { print $0 }'
Enter fullscreen mode Exit fullscreen mode

Another searching mechanism is using control operations like if, for example:

% cat example.txt
> This is an AWK example

% awk 'if ($1=="This"){ print $0 }'
This is an AWK example
Enter fullscreen mode Exit fullscreen mode

AWK, GAWK, NAWK or MAWK

Finally as usual in any programming language, variants tend to appear with time, and AWK was not the exception; what could be considered the most important (according to me) are the next.

  • GAWK - GNU AWK is available from the GNU project's open source and is currently maintained.
  • NAWK - New AWK Computing, a news release on the AWK project 3
  • MAWK - Fast AWK implementation which it's codebase is based on a byte-code interpreter

Of course, there are other multiple variants out there, and you won't have any trouble finding them.

As you can see, AWK is an excellent flexible and robust command-line tool, which takes a while to ramp up to, but once you get the basics is pretty simple to use and explode its potential.

In the next post, I will go deeper into different and more complex scenarios and examples; let me know if you have any questions or comments or want more specific related content.



  1. https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language/ 

  2. https://www.gnu.org/software/gawk/manual/html_node/Function-Calls.html 

  3. Robbins, Arnold (March 2014). "The GNU Project and Me: 27 Years with GNU AWK" (PDF). skeeve.com. Retrieved October 4, 2014. 

💖 💪 🙅 🚩
sergiomarcial
Sergio Marcial

Posted on October 17, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

AWK an old-school tool today
beginners AWK an old-school tool today

October 17, 2021