Tsuyoshi Chujo
Posted on January 24, 2018
HTTP is the most used technology now and indispensable for all the web services.
We can easily send HTTP requests with HTTP client libraries we like, or can easily run HTTP servers with whatever languages and frameworks we like. We don't need to think about what is happening in the libraries or frameworks.
However, I believe that understanding the basic idea of HTTP would be useful for many programmers, when thinking of what HTTP header should be added or thinking of what is HTTP/2 for example.
In this post, I'm going to explain how HTTP works, starting with running TCP socket programs.
Tiny TCP socket programs
Let's start with taking a look at TCP, which is underlying HTTP, with a tiny TCP socket program.
The ability of TCP is quite simple, send / receive data as byte stream. There is no data format, no end of data, no rule for timing of sending data. Thus, client can send whatever data to the server at whenever it wants, and vice versa.
Here is sample Ruby programs of TCP server / client, both of which can send data, input from keyboard, and output data, received from the other.
# server.rb
require 'socket'
port = 20000
server = TCPServer.new(port)
socket = server.accept
Thread.new do
loop do
data = socket.gets
p data
end
end
loop do
data = gets
socket.print(data)
end
# client.rb
require 'socket'
host = '127.0.0.1'
port = '20000'
socket = TCPSocket.open(host, port)
Thread.new do
loop do
data = socket.gets
p data
end
end
loop do
data = gets
socket.sendmsg data
end
You can check how they work by opening two terminals and running each scripts above.
$ ruby server.rb
$ ruby client.rb
* Note that server.rb
must be run first because client.rb
expects to the server already run.
You can type any letters you want. When you type Enter key, you may notice that both client and server can send / receive data each other at any time.
Communication using HTTP
Now, let's take a look at how HTTP clients send HTTP requests, using the previous client.rb
program.
Before going ahead, we need a simple HTTP server as a sandbox.
In theory, we can build HTTP server program by extending server.rb
program above, but it's too challenging to do within this post.
Instead of building HTTP server program by ourselves, WEBrick::HTTPServer
in Ruby is available as HTTP server in localhost. Just copy & paste the CLI command below, and run it to start a HTTP server.
$ ruby -rwebrick -e 'WEBrick::HTTPServer.new(:DocumentRoot => "./", :Port => 20000).start'
The goal of this section is to send HTTP requests to the sandbox server above and receive 200 response with nice response body by sending appropriate data with client.rb
program.
Format of HTTP request
As we saw in the previous section, communication between client and server with TCP socket does not have any rule about sending / receiving data.
HTTP, which is based on TCP, is one of the protocols which provides the specification of "How server and client communicate each other". In other words, HTTP defines the data format, timing to send data, when to close connection, etc.
You can check specifications in detail on RFC 7230. (I am also trying to read this page to write this post)
Let's start with the first line of request. As defined section-3.1.1:RFC 7230, HTTP method, request target and HTTP protocol version appear on the first line, with the format below.
method SP request-target SP HTTP-version CRLF
If we want to send GET
request to the root path with HTTP/1.1
, for example, the first line should be like this.
GET / HTTP/1.1
So let's try to send the data above using client.rb
.
Run client.rb
and type the strings above.
$ ruby client.rb
GET / HTTP/1.1
Then, press enter key twice and you would get HTTP response with response code 200.
"HTTP/1.1 200 OK \r\n"
"Content-Type: text/ html; charset=\"UTF-8\"\r\n"
"Server: WEBrick/1.3.1 (Ruby/2.4.1/2017-03-22)\r\n"
"Date: Tue, 23 Jan 2018 13:33:51 GMT\r\n"
"Content-Length: 9173\r\n"
"Connection: Keep-Alive\r\n"
Cool! We could send HTTP request to the server successfully with simple, tiny, toy TCP socket program.
Receiving HTTP Response
Now, take a closer look at when the server sends response to the client.
Remember that you pressed enter key twice when you send a HTTP request, even though the first line ("GET / HTTP/1.1") had been sent right after the enter key was pressed once.
It is because the server is supposed to return response after the client finishes sending all the data. Therefore, the server has to evaluate that it has received entire data by looking at the data itself according to specification of HTTP protocol.
The entire HTTP request format is defined like this. (See section-3:RFC 7230)
HTTP-message = start-line
*( header-field CRLF )
CRLF
[ message-body ]
You may notice that header-field
is supposed to follow the first line, so the server cannot response before making it clear header-field
is over. In other words, the server calculates the end of header-field
with receiving double CRLF
. (see CRLF
appears at the end of header-field
first and again in the next empty line)
That is why the HTTP server sent response to the client after pressing enter key twice, which means header-field
is empty and the request message is over here.
Apart from Keep-Alive
header introduced in HTTP/1.1, HTTP connection would be closed after sending response from server to client once. (appendix-A.1.2:RFC 7230)
In HTTP/1.0, each connection is established by the client prior to the request and closed by the server after sending the response.
You can try to send data again after receiving responce, and you may soon notice that the connection is already closed. (socket.sendmsg
returns nil
repeatedly)
A HTTP connection is over here and you must start with connecting to the server again.
Summary
We experienced connecting to HTTP server with TCP socket program in this post. It is clear now that HTTP is based on TCP connection and introduces common rules of how to use TCP for efficient communication between web applications and clients (Mobile app, CLI programms, etc).
Please note that the information about HTTP protocol introduced in this post is only a couple of features of HTTP protocol. To understand HTTP in practice, you need to know much more, such as roles of request headers, connection management (such in Keep-Alive
header), or other things stated in the RFC documents.
Hope this post will help you understand the first step of how HTTP works.
Posted on January 24, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.