Mastering Bash 0 — Internal Field Separators

deathroll

deathroll

Posted on April 23, 2023

Mastering Bash 0 — Internal Field Separators

Preamble

Are you a developer, sysadmin, or DevOps engineer? Then you probably spend lots of time on your terminal. I bet you use the GNU Bash shell or something highly similar — take Zsh, for example.

It's essential to know your tools well to work more efficiently. As a Linux geek, I use Bash to automate my daily routine even when it's not for my job, but tired of constantly searching the Web to learn how to achieve certain things in Bash. So, recently I've read through the whole manual. It's relatively small, a little more than 190 pages in total.

This article is going to start a series on mastering GNU Bash.

A Bit of Theory

Throwing a bunch of commands at the reader without somewhat detailed explanations is as helpful as teaching a five-year-old how to swim by throwing them in the sea. That's why tutorials should start at least with some general knowledge IMHO.

Shell Expansions

Bash is very flexible when it comes to writing commands. You can always do something a lot simpler and shorter than you would without different types of expansions Bash performs on the commands it receives from a user.

In essence, shell expansions enable users to conveniently manipulate data on the command line, such as regular text or variables, using special syntactic constructs. Think of them as shortcuts.


Currently, there are seven types of shell expansions supported in Bash:

  • brace expansion
  • tilde expansion
  • parameter and variable expansion
  • command substitution
  • arithmetic expansion
  • word splitting
  • filename expansion

We won't cover all of them since that would take too much time. Besides, there's no point in duplicating the manual.


Here's a short example of a parameter expansion:

[deathroll@fedora ~]$ EXAMPLE_TEXT='sOmE tExT Owo'; echo ${EXAMPLE_TEXT,,}
Enter fullscreen mode Exit fullscreen mode

Parameter expansion — alongside command substitution and arithmetic expansion — is denoted by the "$" symbol.

The command above produces the text below because the shell performs a parameter expansion for modifying the case of alphabetic characters to lowercase on the given variable.

some text owo
Enter fullscreen mode Exit fullscreen mode

You could also provide a pattern for operating only on specific characters. One character at a time is matched, though, so no whole words and character sequences can be used.

[deathroll@fedora ~]$ EXAMPLE_TEXT='sOmE tExT Owo'; echo ${EXAMPLE_TEXT,,O}
somE tExT owo
[deathroll@fedora ~]$ echo ${EXAMPLE_TEXT,,[OT]}
somE tExt owo
[deathroll@fedora ~]$

Builtins

You may already know this, but not all commands you type on the command line are actual programs found somewhere on a filesystem. Bash has built-in commands called "builtins." And you use them all the time. Probably the most commonly used are echo and cd.

My personal favorite is the declare builtin. It enables you to declare a variable of a specific type (e.g., array or integer).

Fields

Since the topic is about internal field separators, it's also a good idea to touch a bit on what are fields.

Here's how the manual defines field:

A unit of text that is the result of one of the shell expansions. After expansion, when executing a command, the resulting fields are used as the command name and arguments.

Pretty self-explanatory.

A Word on Internal Field Separators

Internal field separators tell Bash how to split a field into words — these are just sequences of characters treated as single units.

The separators are listed in the IFS variable that contains three symbols by default: space, tab, newline.

— But what does it mean for me as a user?
— Things will get clear with a couple of simple examples.


Suppose you want to get a list of directories and perform some actions on their names. If you search for directories with the find command, the output will contain multiple lines (i.e., separated by newline characters).

[deathroll@fedora ~]$ find . -mindepth 1 -maxdepth 1 -type d -not -name '.*'
./Pictures
./bin
./Videos
./Public
./Music
./Templates
./git
./nvim
./Downloads
./projects
./NAS
./Desktop
./Documents
./snap
./learn
[deathroll@fedora ~]$
Enter fullscreen mode Exit fullscreen mode

When you substitute the command above so that its output becomes the value of some variable or a part of another command, the shell splits this field into words, dividing where it finds the newline character — or any other character listed in the IFS variable.

[deathroll@fedora ~]$ declare -a EXAMPLE_ARR=(`find . -mindepth 1 -maxdepth 1 -type d -not -name '.*'`)
[deathroll@fedora ~]$ echo ${EXAMPLE_ARR[@]}
./Pictures ./bin ./Videos ./Public ./Music ./Templates ./git ./nvim ./Downloads ./projects ./NAS ./Desktop ./Documents ./snap ./learn
[deathroll@fedora ~]$ echo ${EXAMPLE_ARR[0]}
./Pictures
[deathroll@fedora ~]$
Enter fullscreen mode Exit fullscreen mode

OK, that seems to be pretty reasonable. Now I want you to see the commands below and think about the data stored in a variable. What do the array elements look like? More specifically, the first element. Does it look like "2023-02-26 22-10-40.mp4?"

[deathroll@fedora Videos]$ ls *' '*.mp4
'2023-02-26 22-10-40.mp4'  '2023-03-14 00-22-19.mp4'  '2023-03-31 21-12-39.mp4'  '2023-04-08 13-45-36.mp4'
'2023-02-26 22-16-21.mp4'  '2023-03-24 11-54-46.mp4'  '2023-04-04 10-05-41.mp4'  '2023-04-11 11-53-24.mp4'
'2023-03-12 18-59-36.mp4'  '2023-03-24 11-54-57.mp4'  '2023-04-04 11-37-05.mp4'  '2023-04-11 12-00-41.mp4'
'2023-03-14 00-10-02.mp4'  '2023-03-24 11-55-14.mp4'  '2023-04-04 12-06-14.mp4'  '2023-04-11 12-00-51.mp4'
'2023-03-14 00-11-13.mp4'  '2023-03-30 16-16-21.mp4'  '2023-04-04 12-19-03.mp4'  '2023-04-11 13-55-54.mp4'
'2023-03-14 00-17-25.mp4'  '2023-03-31 21-07-22.mp4'  '2023-04-04 12-19-23.mp4'  '2023-04-20 19-41-44.mp4'
[deathroll@fedora Videos]$ EXAMPLE_ARR=(`ls *' '*.mp4`)
[deathroll@fedora Videos]$
Enter fullscreen mode Exit fullscreen mode

The moment of truth... 😰
As you may have already noticed, the filenames contain the space character. Now, recall what is written about IFS at the start of this section of the article.

Let's list the array elements, each on a separate line, to make the output more readable, and pick just the first ten lines to make it short...

[deathroll@fedora Videos]$ head -10 <(for F in ${EXAMPLE_ARR[@]}; do printf '%b\n' $F; done)
2023-02-26
22-10-40.mp4
2023-02-26
22-16-21.mp4
2023-03-12
18-59-36.mp4
2023-03-14
00-10-02.mp4
2023-03-14
00-11-13.mp4
[deathroll@fedora Videos]$ # Oh no! My filenames are broken! 😱
[deathroll@fedora Videos]$
Enter fullscreen mode Exit fullscreen mode

Bash splits filenames into words where it finds any of the characters listed in the IFS variable. Sometimes that's crucial, and you want to change the shell's behavior to split only at the characters you specify.

Congrats if your guess was right! 🎉


Time for Tuning | Use IFS to Your Advantage

Since IFS is a variable, it can be altered by a user. All you need to do is to assign a new value to it. But make sure to use appropriate quoting when you intend to use escape sequences — double quotes ("") or ANSI-C Quoting ($'').

Changing the IFS value for the rest of the shell process execution is not recommended since it can cause you more trouble than bring benefits.

What I prefer is either change the IFS value, execute the commands where custom field separators are needed, and unset IFS, or execute the commands — including IFS variable assignment — in a subshell to avoid mutating the current shell environment.

Don't worry. The shell can't be broken by unsetting the IFS variable. When IFS is unset, Bash will use the default value identical to the variable's initial value.

Example 1 — Unsetting
[deathroll@fedora Videos]$ IFS=$'\n'
[deathroll@fedora Videos]$ head -10 <(for F in `ls *' '*.mp4`; do printf '%b\n' $F; done)
2023-02-26 22-10-40.mp4
2023-02-26 22-16-21.mp4
2023-03-12 18-59-36.mp4
2023-03-14 00-10-02.mp4
2023-03-14 00-11-13.mp4
2023-03-14 00-17-25.mp4
2023-03-14 00-22-19.mp4
2023-03-24 11-54-46.mp4
2023-03-24 11-54-57.mp4
2023-03-24 11-55-14.mp4
[deathroll@fedora Videos]$ unset IFS
[deathroll@fedora Videos]$
Enter fullscreen mode Exit fullscreen mode

Example 2 — Subshell
[deathroll@fedora Videos]$ (IFS="\n"; head -10 <(for F in `ls *' '*.mp4`; do printf '%b\n' $F; done))
2023-02-26 22-10-40.mp4
2023-02-26 22-16-21.mp4
2023-03-12 18-59-36.mp4
2023-03-14 00-10-02.mp4
2023-03-14 00-11-13.mp4
2023-03-14 00-17-25.mp4
2023-03-14 00-22-19.mp4
2023-03-24 11-54-46.mp4
2023-03-24 11-54-57.mp4
2023-03-24 11-55-14.mp4
[deathroll@fedora Videos]$ # Since the command was executed in a subshell, the current environment is left untouched.
[deathroll@fedora Videos]$ head -10 <(for F in `ls *' '*.mp4`; do printf '%b\n' $F; done)
2023-02-26
22-10-40.mp4
2023-02-26
22-16-21.mp4
2023-03-12
18-59-36.mp4
2023-03-14
00-10-02.mp4
2023-03-14
00-11-13.mp4
[deathroll@fedora Videos]$
Enter fullscreen mode Exit fullscreen mode


Now that you hopefully know a bit more about Bash, it's time to experiment yourself! Good luck and have fun, see ya in the next article 🤟!

💖 💪 🙅 🚩
deathroll
deathroll

Posted on April 23, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related