Linux terminals, tty, pty and shell - part 2

napicella

Nicola Apicella

Posted on March 2, 2020

Linux terminals, tty, pty and shell - part 2

This is the second post of the series on Linux terminals, tty, pty and shell.

In the first post we have talked about the difference between tty, pty and Shell and what happens when we press a key in a Terminal (like Xterm). If you haven't had the chance to read it yet, this is the link to the first part of the article. Without reading the first part, some of the things discussed here might be harder to understand.

In this article we will:

  • define what's a line discipline and see how programs can control it
  • build a simple remote terminal application in golang

Let's get to it.

Line discipline

In the previous article we introduced the line discipline as an essential part of the terminal device.

But what is it?

From Wiki:
In the Linux terminal subsystem, the line discipline is a kernel module which serves as a protocol handler between the level device driver and the generic program interface routines (such as read(2), write(2) and ioctl(2)) offered to the programs.

This definition is a bit dry, fortunately it contains a few keywords we can use to dive deeper.

In a Unix-like system everything is a file, we all have heard this before. A program managing a pty will essentially perform read and write operations on a pair of files, pty master and pty slave.

A program writing data to disk, sending a document to the printer or getting data from an usb stick will use the same read and write operations, although the work required to perform the tasks depends on the type of the device and the characteristics of the device itself.
Our program is completely unaware of those details - the kernel provides a programming interface and takes care of all these differences for us.

When a program calls the read or write operations, behind the scene, the kernel will use the right implementation for us.
In the case of the pty, the kernel will use the tty driver to handle the communication between the terminal and the program. The line discipline is a logical component of the tty driver.

What does it do?

The following is a (non comprehensive) list of the line discipline functionalities.

  • when we type, characters are echoed back to the pty master (terminals are dumb)
  • it buffers the characters in memory. It sends them to the pty slave when we press enter
  • when we type CTRL + W, it deletes the last word we typed
  • when we type CTR + C, it sends the kill -2 (SIGINT) command to the program attached to the pty slave
  • when we press CRL + Z, it sends the kill -STOP command
  • when pressing CTRL + S, it sends XOFF to the tty driver to put the process that is sending data into a sleep state
  • it replaces all the New Line (Enter key) characters with a Carriage return and New Line sequence.
  • when we press backspace, it deletes the character from the buffer. It then sends to the pty master the instructions to delete the last character

Historical note: XON/XOFF is a flow control feature that traces back to the time of hardware teletypes connected to the computer via a UART line. When one end of the data link could not receive any more data (because the buffer was full) it would send an "XOFF" signal to the sending end of the data link to pause until the "XON" signal was received.

Today, with computers featuring Giga bytes of RAM, "XOFF" is used just as a mean to suspend processes.

The Linux system is made of tons of abstractions which makes our life easier when we need to program. As with all the abstractions, they also make it hard to understand what's going on.

Fortunately, we have an ace in the hole: the power of trying out stuff. This is what we are going to next.

Managing the line discipline with stty

The stty is an utility to query and change the line discipline rules for the device connected to its standard input.

Run in a terminal stty -a



$ stty -a
speed 38400 baud; rows 40; columns 80; line = 0;
intr = ^C; quit = ^\; erase = ^H; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc -ixany -imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke


Enter fullscreen mode Exit fullscreen mode

The output of the command returns the terminal characteristics and the line discipline rules.

The first line contains the baud rate, the number of rows and columns of the terminal.

Historical note: When the terminal and the computer where connected via a line, baud rate provided the symbol rate of the channel. The baud rate is meaningless for a pty. You can read more about it in its wiki page

The next line contains key bindings: For example the initr = ^C maps CTRL + C to kill -2 (SIGINT).
Scrolling through the end of the output we find the line discipline rules that do not require a key binding.
Do you see the echo ?

echo is the rule which instructs the line discipline to echo characters back.

You can imagine what happens if we disable it.
Open a terminal and run:



$ stty -echo


Enter fullscreen mode Exit fullscreen mode

Then type something...nothing will appear on the screen.
The line discipline does not echo the characters back to the pty master, thus the terminal does not show what we type anymore!

Everything else works as usual. For example, type ls followed by enter. You will see the output of ls, although you haven't seen the characters ls when you typed them.

We can restore it by typing:



stty echo


Enter fullscreen mode Exit fullscreen mode

We can disable all the rules of the line discipline by typing stty raw. Such terminal is called raw terminal.
A cooked terminal is the opposite of a raw terminal - it's a terminal connected to a line discipline with all the rules enabled.

Why would someone want a raw terminal? No echo, line editing, suspend or killing, etc. It looks like a nightmare!
Well, it depends on what's the program receiving the input of the terminal. For example, programs like VIM set the terminal to raw because they need to process the characters themselves. Any external intervention that transforms or eats up characters would be a problem for an editor.

As we will see, our remote terminal would need a raw terminal as well.

Build a remote terminal in golang

A remote terminal program allows to access a terminal on a remote host.
Connecting through SSH to a remote machine does just that. What we want to do is similar to the result of running ssh, minus the encryption bits.

I think we know enough to start hacking on some code.
We would need a client-server application. The client runs on our machine and the server sits on some remote host.

The client and the server will communicate via tcp.

I have simplified the code to highlight the interesting bit for this article. You can find the code for the example and how to build it on git.

Let's start from the server.

Remote terminal server

The server performs the following operations:

  • open a tcp connection and listen for incoming requests
  • it creates a pty when it receives a request
  • run the bash process
  • assign the standard input, output and error of bash to the pty slave
  • send data received from the connection down to the pty master

Our server does exactly what the terminal emulator does, but in this case instead of drawing stuff to the screen, it performs the following:

  • read from the master and send the content down to the tcp connection
  • read from the tcp connection and write the content to the master

Follows the code for the server:



func server() error {
// Create command
c := exec.Command("bash")

<span class="c">// Start the command with a pty.</span>
<span class="c">// It also assign the standard input, output and error of bash to the pty slave</span>
<span class="n">ptmx</span><span class="p">,</span> <span class="n">e</span> <span class="o">:=</span> <span class="n">pty</span><span class="o">.</span><span class="n">Start</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="k">if</span> <span class="n">e</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">e</span>
<span class="p">}</span>
<span class="c">// Make sure to close the pty at the end.</span>
<span class="k">defer</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span> <span class="n">_</span> <span class="o">=</span> <span class="n">ptmx</span><span class="o">.</span><span class="n">Close</span><span class="p">()</span> <span class="p">}()</span> <span class="c">// Best effort.</span>

<span class="k">return</span> <span class="n">listen</span><span class="p">(</span><span class="n">ptmx</span><span class="p">)</span>
Enter fullscreen mode Exit fullscreen mode

}

func listen(ptmx *os.File) error {
fmt.Println("Launching server...")

<span class="c">// listen on all interfaces</span>
<span class="n">ln</span><span class="p">,</span> <span class="n">e</span> <span class="o">:=</span> <span class="n">net</span><span class="o">.</span><span class="n">Listen</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span> <span class="s">":8081"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">e</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">e</span>
<span class="p">}</span>
<span class="c">// accept connection on port</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">e</span> <span class="o">:=</span> <span class="n">ln</span><span class="o">.</span><span class="n">Accept</span><span class="p">()</span>
<span class="k">if</span> <span class="n">e</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">e</span>
<span class="p">}</span>

<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">Copy</span><span class="p">(</span><span class="n">ptmx</span><span class="p">,</span> <span class="n">conn</span><span class="p">)</span> <span class="p">}()</span>
<span class="n">_</span><span class="p">,</span> <span class="n">e</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">Copy</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">ptmx</span><span class="p">)</span>
<span class="k">return</span> <span class="n">e</span>
Enter fullscreen mode Exit fullscreen mode

}

Enter fullscreen mode Exit fullscreen mode




Remote terminal client

It would appear that our client would just need to open a tcp connection with the server, send the standard input to the tcp connection and write the data from the connection to the standard standard output.

And indeed, there isn't much more to it.

There is only a caveat, the client should send all the characters to the server. We do not want the line discipline on the client to interfere with the characters we type. Setting the terminal to raw mode does just that.

The client performs the following operations:

  • set the terminal to raw mode
  • open a tcp connection with the remote host
  • send the standard input to the tcp connection
  • send the data from the tcp connection to the standard output

Finally let's see the client code:



func client() error {
// MakeRaw put the terminal connected to the given file
// descriptor into raw mode and returns the previous state
// of the terminal so that it can be restored.
oldState, e := terminal.MakeRaw(int(os.Stdin.Fd()))
if e != nil {
return e
}
defer func() { _ = terminal.Restore(int(os.Stdin.Fd()), oldState) }()

<span class="c">// Connect to this socket.</span>
<span class="c">// If client and server runs on different machines,</span>
<span class="c">// replace the loopback address with the address of</span>
<span class="c">// remote host</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">e</span> <span class="o">:=</span> <span class="n">net</span><span class="o">.</span><span class="n">Dial</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span> <span class="s">"127.0.0.1:8081"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">e</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">e</span>
<span class="p">}</span>

<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">Copy</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Stdout</span><span class="p">,</span> <span class="n">conn</span><span class="p">)</span> <span class="p">}()</span>
<span class="n">_</span><span class="p">,</span> <span class="n">e</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">Copy</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">Stdin</span><span class="p">)</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"Bye!"</span><span class="p">)</span>

<span class="k">return</span> <span class="n">e</span>
Enter fullscreen mode Exit fullscreen mode

}

Enter fullscreen mode Exit fullscreen mode




What happens when we run the program?

Now that we have the client and the server, we can see the whole workflow from client to server.

Alt Text

In the following we assume the golang program has been compiled in a binary called remote. We will also assume the program has already been started on the server machine.



go build -o remote main.go

Enter fullscreen mode Exit fullscreen mode




Initialization

The client

  1. the user (the stick-man in the picture[*]) opens a terminal emulator, like XTERM. The terminal emulator will:
    1. draw the UI to the video and requests a pty from the OS
    2. launch bash as subprocess
    3. set the std input, output and error of bash to be the pty slave
    4. listen for keyboard events
  2. the user types ./remote -client
    1. the terminal emulator receives the keyboard events
    2. sends the character to the pty master
    3. the line discipline gets the character and buffers them. It copies them to the slave only when Enter is pressed. It also writes back its input to the master (echoing back).
    4. when the user presses enter, the tty driver takes care of copying the buffered data to the pty slave
  3. the user presses Enter:
    1. bash (which was waiting for input on standard input) finally reads the characters
    2. bash interprets the characters and figures it needs to run a program called remote
    3. bash forks itself and runs the remote program in the fork. The forked process will have the same stdin, stdout and stderr used by bash, which is the pty slave. The remote client starts
      1. set the terminal in raw mode, disabling the line discipline
      2. open a tcp connection with the server

The server

  1. accept the tcp connection
  2. request a pty from the OS
  3. launch bash and set the std input, output and error of bash to be the pty slave
    1. bash starts
    2. bash writes to standard output (pty slave) the bash line ~ >
    3. the tty driver copies the characters from the pty slave to pty master
  4. the remote server copies the data from the pty master to the tcp connection

The client

  1. the client receives data from the tcp connection and sends it to the standard output
  2. the tty driver copies the characters from the pty slave to pty master
  3. the terminal emulator gets the characters from the pty master and draws them on the screen

All of this is just to display on the client the bash line ~ > coming from the bash process which runs on the remote server!
Now, what happens when the user types a command?

Typing a command

The client

  1. the user types ls -la followed by Enter
  2. the terminal emulator sends the characters to the pty master
  3. the tty driver copies them as they come to the pty slave (remember the remote client has disabled the line discipline)
  4. the remote client reads the data from the pty slave and sends them through the tcp connection
  5. the remote client waits to read the characters from the tcp connection

The server

  1. the remote server writes the bytes received from the tcp connection to the pty master
  2. tty driver buffers them until the character Enter has been received. It also writes back its input to the master (echoing back).
  3. the remote server reads the characters from the master and sends them back to the tcp connection (these are the characters typed by the client!)
  4. the tty driver writes the data to the pty slave
  5. bash interprets the character and figures it needs to run a program called ls -la
  6. bash forks the process. The forked process will have the same stdin, stdout and stderr used by bash, which is the pty slave.
  7. the output of the command is copied to the pty slave
  8. the tty driver copies the output to the pty master
  9. the remote server copies the data from the pty master to the tcp connection

An interesting thing to notice. On the client machine, all the characters we see on the screen come from the remote server. Including what we type!
It's the line discipline on the remote server which echoes back the characters and from there find their way back to the client!

Look back at our little golang program and compare it with the number of steps in the workflow.
With a little over 50 lines of code we were able to implement the whole workflow. Our program is small thanks to the kernel, which performs the heavy lifting.

That's the power of abstraction

Conclusions

We have reached the end of the series. The research work to write it was a lot fun and I hope it was also an interesting read.

Some of the content I write is too short for a post, but still interesting enough to share it as a tweet. Follow me on Twitter to get them in your Twitter feed!

Happy coding :)
-Nicola


* I know, it looks more like the game of hangman than a system diagram

💖 💪 🙅 🚩
napicella
Nicola Apicella

Posted on March 2, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related