Dmitry Romanoff
Posted on September 26, 2022
AWK is a text-processing utility on GNU/Linux.
It is very powerful and uses a simple programming language.
It can solve complex text processing tasks with a few lines of code.
Example of tasks can be done with AWK:
Text processing,
Producing formatted text reports,
Performing arithmetic operations,
Performing string operations,
Parsing log files, including log files of DBs,
Constructing queries to populate data into DBs
and many more.
AWK follows a simple workflow − Read, Execute, and Repeat.
Read
AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory.
Execute
All AWK commands are applied sequentially on the input. By default AWK executes commands
on every line. We can restrict this by providing patterns.
Repeat
This process repeats until the file reaches its end.
BEGIN block
The syntax of the BEGIN block is as follows −
Syntax
BEGIN {awk-commands}
The BEGIN block gets executed at program start-up. It executes only once. This is a good place
to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. Please
note that this block is optional.
Body Block
The syntax of the body block is as follows −
Syntax
/pattern/ {awk-commands}
The body block applies AWK commands on every input line. By default, AWK executes
commands on every line. We can restrict this by providing patterns. Note that there are no
keywords for the Body block.
END Block
The syntax of the END block is as follows −
Syntax
END {awk-commands}
The END block executes at the end of the program. END is an AWK keyword and hence it must
be in upper-case. Please note that this block is optional.
dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"}'
Sr No Name Sub Marks
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"} {print}' marks.txt
Sr No Name Sub Marks
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '{print}' marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ cat command.awk
{print}
dmi@dmi-laptop:~/my_awk$ awk -f command.awk marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ awk -v name=Linda 'BEGIN{printf "Name = %s\n", name}'
Name = Linda
dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '{print $3 "\t" $4}' marks.txt
Physics 80
Maths 90
Biology 87
English 85
History 89
dmi@dmi-laptop:~/my_awk$
In the following example we're searching form pattern a.
When a pattern match succeeds, it executes a command from the body block.
dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '/a/ {print $0}' marks.txt
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$
In the absence of a body block − default action is taken which is to print the record.
dmi@dmi-laptop:~/my_awk$ awk '/a/' marks.txt
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$
We can print columns in any order.
dmi@dmi-laptop:~/my_awk$ awk '/a/ {print $4 "\t" $3}' marks.txt
90 Maths
87 Biology
85 English
89 History
dmi@dmi-laptop:~/my_awk$
We can count and print the number of lines for which a pattern match succeeded.
dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '/a/{++cnt} END {print "Count = ", cnt}' marks.txt
Count = 4
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat my_example.txt
aaa bbb
cccccc dd
eee
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 3' my_example.txt
aaa bbb
cccccc dd
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 5' my_example.txt
aaa bbb
cccccc dd
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 8' my_example.txt
cccccc dd
fffff fff ffff
ggg hh hhh hhhh
dmi@dmi-laptop:~/my_awk$
$0 variable stores the entire line.
In the absence of a body block, default action is taken, i.e., the print action.
ARGC is a standard AWK variable
It implies the number of arguments provided at the command line.
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN {print "Arguments =", ARGC}'
One Two Three Four
Arguments = 5
ARGV is a standard AWK variable.
It is an array that stores the command-line arguments.
The array's valid index ranges from 0 to ARGC-1.
dmi@dmi-laptop:~/my_awk$ cat command.awk
BEGIN {
for (i = 0; i < ARGC - 1; ++i) {
printf "ARGV[%d] = %s\n", i, ARGV[i]
}
}
dmi@dmi-laptop:~/my_awk$ awk -f command.awk one two three four five six seven eight
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three
ARGV[4] = four
ARGV[5] = five
ARGV[6] = six
ARGV[7] = seven
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN {
for (i = 0; i < ARGC - 1; ++i) {
printf "ARGV[%d] = %s\n", i, ARGV[i]
}
} ' one two three four five six seven eight
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three
ARGV[4] = four
ARGV[5] = five
ARGV[6] = six
ARGV[7] = seven
dmi@dmi-laptop:~/my_awk$
Regular expression .
It matches any single character except the end of line character.
dmi@dmi-laptop:~/my_awk$ echo -e "cat\nbat\nfun\nfin\nfan"
cat
bat
fun
fin
fan
echo -e ---- enables interpretation of backslash escapes
dmi@dmi-laptop:~/my_awk$ echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/'
fun
fin
fan
dmi@dmi-laptop:~/my_awk$
Regular expression ^ .
It matches the start of the line.
dmi@dmi-laptop:~/my_awk$ echo -e "This\nThat\nThere\nTheir\nthese"
This
That
There
Their
these
dmi@dmi-laptop:~/my_awk$ echo -e "This\nThat\nThere\nTheir\nthese" | awk '/^The/'
There
Their
dmi@dmi-laptop:~/my_awk$
Regular expression $.
It matches the end of line.
dmi@dmi-laptop:~/my_awk$ echo -e "knife\nknow\nfun\nfin\nfan\nnine"
knife
know
fun
fin
fan
nine
dmi@dmi-laptop:~/my_awk$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk '/n$/'
fun
fin
fan
dmi@dmi-laptop:~/my_awk$
Regular expression [ ] Match character set
It is used to match only one out of several characters.
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall"
Call
Tall
Ball
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" | awk '/[CT]all/'
Call
Tall
dmi@dmi-laptop:~/my_awk$
Regular expression [^ ] Exclusive set
In the exclusive set, the ^ negates the set of characters in the square brackets.
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall"
Call
Tall
Ball
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" | awk '/[^CT]all/'
Ball
dmi@dmi-laptop:~/my_awk$
How to find the length of each record in a file?
dmi@dmi-laptop:~/my_awk$ cat my_example.txt
aaa bbb
cccccc dd
eee
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk '{print $0, ".....", length($0)}' my_example.txt
aaa bbb ..... 7
cccccc dd ..... 9
eee ..... 3
fffff fff ffff ..... 14
ggg hh hhh hhhh ..... 15
kkk ll ..... 6
Delimiter
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt
aaa, bbb, ccc, dddd
eee
ff, gggg, hhhh, kk, llllll, mmmm, nnn
ooooo, pppp,qqq
rrr
sss
ttt, uuu,
vvv
dmi@dmi-laptop:~/my_awk$ awk -F, ' { print $2 } ' some_file_with_commas.txt
bbb
gggg
pppp
uuu
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ awk -F, ' length($2)>0 { print $2 } ' some_file_with_commas.txt
bbb
gggg
pppp
uuu
dmi@dmi-laptop:~/my_awk$
Sum of file sizes with AWK on a list of files
dmi@dmi-laptop:~/my_awk$ ls -l
total 16
-rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk
-rw-rw-r-- 1 dmi dmi 120 Dec 11 08:18 marks.txt
-rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt
-rw-rw-r-- 1 dmi dmi 100 Dec 11 09:11 some_file_with_commas.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk '{sum += $5} END {print sum}'
379
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ ls -l
total 16
-rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk
-rw-rw-r-- 1 dmi dmi 120 Dec 11 08:18 marks.txt
-rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt
-rw-rw-r-- 1 dmi dmi 100 Dec 11 09:11 some_file_with_commas.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk '$5 < 100 {print $0} '
total 16
-rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk
-rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk '$5 < 100 {print $9} '
command.awk
my_example.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk 'length($5)>0 && $5 < 100 {print $9} '
command.awk
my_example.txt
dmi@dmi-laptop:~/my_awk$
Skip first line of file
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$ awk '(NR>1)' some_data_to_populate.data
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$
The awk's NR variable indicates the number of records in a file.
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$ awk -F, '(NR>1) { printf("%s", $2) } ' some_data_to_populate.data
Green street Apple street Orange streetdmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$ awk -F, '(NR>1) { printf("%s\n", $2) } ' some_data_to_populate.data
Green street
Apple street
Orange street
(NR>1) - not print the first rec in the file
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt
aaa, bbb, ccc, dddd
eee
ff, gggg, hhhh, kk, llllll, mmmm, nnn
ooooo, pppp,qqq
rrr
sss
ttt, uuu,
vvv
dmi@dmi-laptop:~/my_awk$ awk ' { printf("\x27") } ' some_file_with_commas.txt
''''''''dmi@dmi-laptop:~/
dmi@dmi-laptop:~/my_awk$ awk ' { printf("\x27\n") } ' some_file_with_commas.txt
'
'
'
'
'
'
'
'
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt
aaa, bbb, ccc, dddd
eee
ff, gggg, hhhh, kk, llllll, mmmm, nnn
ooooo, pppp,qqq
rrr
sss
ttt, uuu,
vvv
dmi@dmi-laptop:~/my_awk$ awk -F, ' { printf("\x27%s\x27\n", $1) } ' some_file_with_commas.txt
'aaa'
'eee'
'ff'
'ooooo'
'rrr'
'sss'
'ttt'
'vvv'
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("insert
into some_table values(trim(\x27%s\x27), trim(\x27%s\x27), trim(\x27%s\x27), %s);\n", $1, $2,
$3, $4); } '
insert into some_table values(trim('John'), trim(' Green street'), trim(' 2000-01-01'), 100);
insert into some_table values(trim('Ann'), trim(' Apple street'), trim(' 1980-05-22'), 99);
insert into some_table values(trim('Miki'), trim(' Orange street'), trim(' 1985-01-01'), 97);
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("update
some_table set the_address=trim(\x27%s\x27), the_birthday=trim(\x27%s\x27), the_mark=%s
where the_name=\x27%s\x27;\n", $2, $3, $4, $1); } '
update some_table set the_address=trim(' Green street'), the_birthday=trim(' 2000-01-01'),
the_mark= 100 where the_name='John';
update some_table set the_address=trim(' Apple street'), the_birthday=trim(' 1980-05-22'),
the_mark= 99 where the_name='Ann';
update some_table set the_address=trim(' Orange street'), the_birthday=trim(' 1985-01-01'),
the_mark= 97 where the_name='Miki';
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("insert
into some_table values(trim(\x27%s\x27), trim(\x27%s\x27), trim(\x27%s\x27), %s);\n", $1, $2,
$3, $4); } ' > RunMe.sql
dmi@dmi-laptop:~/my_awk$ cat RunMe.sql
insert into some_table values(trim('John'), trim(' Green street'), trim(' 2000-01-01'), 100);
insert into some_table values(trim('Ann'), trim(' Apple street'), trim(' 1980-05-22'), 99);
insert into some_table values(trim('Miki'), trim(' Orange street'), trim(' 1985-01-01'), 97);
dmi@dmi-laptop:~/my_awk$
Posted on September 26, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.