Deep Dive into PandApache3: Understanding Connection Management and Response Generation
Mary 🇪🇺 🇷🇴 🇫🇷
Posted on July 17, 2024
Welcome to the world of PandApache3, your web server of choice. In this article, we will explore the internal workings of this server, detailing how it handles connections, processes HTTP requests, and generates responses. Whether you are a developer or simply curious, you'll discover the key steps that make all this possible.
In the previous article Deep Dive into PandApache3: Launch Code, we saw how the PandApache3 service starts, from initializing the logger to starting the TcpListener. Now, it's time to dive into the heart of the system to understand what happens when a connection is received. Ready to discover the inner workings of PandApache3? Let's go!
How PandApache3 Manages Incoming Connections
After the StartServerAsync
method, another method is called: RunServerAsync
.
private static async Task RunServerAsync(ConnectionManager connectionManager)
{
TcpListener listener = connectionManager.Listener;
while (true)
{
if (listener.Pending())
{
ISocketWrapper client = new SocketWrapper(listener.AcceptSocket());
await connectionManager.AcceptConnectionsAsync(client);
}
}
}
This method is brief and here's what it does: Remember that our ConnectionManager
has a TcpListener
. We check if there is a connection pending on this listener. If so, we accept the connection for the ConnectionManager
to handle. Since we need to constantly check for new connections, the method runs continuously in an infinite loop. Once a connection is made with our server, we have what we call a client or a socket.
Between Us
What is the
ISocketWrapper
for? Encapsulating the socket inISocketWrapper
abstracts the raw socket implementation details, making it easier to manage connections within theConnectionManager
. This also allows for better testability and maintainability of the code.
Our connection with the client will then be managed in the AcceptConnectionsAsync
function.
public async Task AcceptConnectionsAsync(ISocketWrapper client)
{
if (_clients.Count < ServerConfiguration.Instance.MaxAllowedConnections)
{
Guid clientId = Guid.NewGuid();
_clients.TryAdd(clientId, client);
Logger.LogInfo("Client connected");
Task.Run(() => HandleClientAsync(client, clientId));
}
else if (_clientsRejected.Count < ServerConfiguration.Instance.MaxRejectedConnections)
{
Guid clientId = Guid.NewGuid();
_clientsRejected.TryAdd(clientId, client);
Logger.LogWarning("Too many connections - rejecting with HTTP 500");
Task.Run(() => HandleClientRejectAsync(client, clientId));
}
else
{
Logger.LogError("Too many connections");
client.Dispose();
return;
}
}
In a normal situation, the client's connection will be handled in a new task (a new thread) to respond to their request, allowing the current thread to accept a new connection, and so on.
In case of server overload, the client's connection, although accepted, will immediately return an error message (also in a new thread). Their request will receive a response, but not the expected one. In an extreme overload situation, it is even possible that the server simply closes the connection without returning anything.
Between Us
How do we decide if the server is overloaded and whether to respond correctly to the client? Imagine each connection to our server, and thus each thread created to perform its task, takes 2 MB of memory, and my server has 512 MB (a number taken for demonstration purposes). I know then that my server cannot support more than 256 connections because the 257th will no longer have resources to function. This can endanger my entire service.
Request Analysis
Now that we have accepted the request and decided to process it, it's time to recall what was explained in the previous article. Each connection manager has a pipeline composed of several middlewares that process each request. If you remember, one of the parameters used by each middleware was an HTTP context (HttpContext
). Here is the IMiddleware
interface to refresh your memory:
public interface IMiddleware
{
Task InvokeAsync(HttpContext context);
}
And the list of middlewares present in PandApache3:
HttpContext
is a fairly simple class composed of a Request
object and an HttpResponse
object:
public class HttpContext
{
public Request Request { get; set; }
public HttpResponse Response { get; set; }
public HttpContext(Request request, HttpResponse response)
{
Request = request;
Response = response;
}
}
Our new thread, which handles the request, will start by executing the HandleClientAsync
function:
private async Task HandleClientAsync(ISocketWrapper client, Guid clientId)
{
Request request = await ConnectionUtils.ParseRequestAsync(client);
if (request == null)
{
return;
}
HttpContext context = new HttpContext(request, null);
await _pipeline(context);
await ConnectionUtils.SendResponseAsync(client, context.Response);
_clients.TryRemove(clientId, out client);
client.Dispose();
Logger.LogInfo("Client closed");
}
First, the HTTP request received by the server is parsed to obtain a Request
object. Parsing involves extracting the first line, the path, the HTTP verb, headers, parameters, the body... in short, all the elements that make up an HTTP request.
Since the response is not yet known, it is currently null
in the HttpContext
. Then, the first middleware of the pipeline is executed. In our case, there is really only one middleware that does anything: the RoutingMiddleware
. The first LoggingMiddleware
simply logs the incoming request, and the third TerminalMiddleware
indicates that we have reached the end of the pipeline.
The Core of Request Processing
Here is the RoutingMiddleware
function executed for each request:
public async Task InvokeAsync(HttpContext context)
{
Logger.LogDebug("Router Middleware");
if (context.Request.Verb.ToUpper().Equals("GET"))
{
Request request = context.Request;
string mainDirectory = ServerConfiguration.Instance.RootDirectory;
string filePath = Path.Combine(mainDirectory, Utils.GetFilePath(request.Path));
if (_FileManager.Exists(filePath))
{
string fileExtension = Path.GetExtension(filePath).Substring(1).ToLowerInvariant();
string mimeType = fileExtension switch
{
// Images
"jpg" or "jpeg" => "image/jpeg",
"png" => "image/png",
"gif" => "image/gif",
"svg" => "image/svg+xml",
"bmp" => "image/bmp",
"webp" => "image/webp",
"ico" => "image/x-icon",
// Text documents
"txt" => "text/plain",
"html" or "htm" => "text/html",
"css" => "text/css",
"js" => "application/javascript",
"json" => "application/json",
"xml" => "text/xml",
// Default case if the extension is not recognized
_ => null
};
}
if (mimeType != null)
{
byte[] data = await File.ReadAllBytesAsync(filePath);
HttpResponse httpResponse = new HttpResponse(200)
{
Body = new MemoryStream(data)
};
httpResponse.AddHeader("Content-Type", mimeType);
httpResponse.AddHeader("Content-Length", httpResponse.Body.Length.ToString());
context.Response = httpResponse;
}
else
{
context.Response = new HttpResponse(404);
}
}
else
{
context.Response = new HttpResponse(404);
}
await _next(context);
}
Let's start with the first condition. Currently, we want our web server to only return static resources like HTML files. The only HTTP verb required for this is GET
. So, any request using another HTTP verb should return a 404 error.
However, if the request is a GET
, it means the client wants to obtain a resource on our server. This resource is the path indicated in our request.
For example, if I send a GET
request to the URL http://pandapache.com/index.html, my path is index.html, and that is the resource my web server must find and return.
But where exactly should we look for index.html? PandApache3, like most web servers, has what is called a root directory. On your computer your root directory is C:/
on Windows and /
on Linux, your web server defines this directory relative to its settings. By default, the root directory of PandApache3 is C:/PandApache3/www/
on Windows and /etc/PandApache3/www
on Linux.
We have the root directory and the requested resource. If it exists, we can return it:
string mainDirectory = ServerConfiguration.Instance.RootDirectory;
string filePath = Path.Combine(mainDirectory, Utils.GetFilePath(request.Path));
if (_FileManager.Exists(filePath))
{
...
}
Between Us
PandApache is written in .NET Core. Even though it's C#, the code can run on both Windows and Linux without any modification. At startup, the operating system is detected by PandApache3, and the default settings for paths (configuration, logs, root directory) adapt to the platform.
Building a Proper Response
Let's now see how a proper response can be generated. In addition to knowing if our resource exists, we also need to determine its MIME type. It's simple: the MIME type is determined by the file extension. Here, we will focus only on the MIME types of the most common files for static web resources: HTML, CSS, JavaScript, and images.
Thus, we need to use the file extension to identify the MIME type:
string fileExtension = Path.GetExtension(filePath).Substring(1).ToLowerInvariant();
string mimeType = fileExtension switch
{
// Images
"jpg" or "jpeg" => "image/jpeg",
"png" => "image/png",
"gif" => "image/gif",
"svg" => "image/svg+xml",
"bmp" => "image/bmp",
"webp" => "image/webp",
"ico" => "image/x-icon",
// Text documents
"txt" => "text/plain",
"html" or "htm" => "text/html",
"css" => "text/css",
"js" => "application/javascript",
"json" => "application/json",
"xml" => "text/xml",
// Default case if the extension is not recognized
_ => null
};
Once the MIME type is known, we can return the appropriate response:
if (mimeType != null)
{
byte[] data = await File.ReadAllBytesAsync(filePath);
HttpResponse httpResponse = new HttpResponse(200)
{
Body = new MemoryStream(data)
};
httpResponse.AddHeader("Content-Type", mimeType);
httpResponse.AddHeader("Content-Length", httpResponse.Body.Length.ToString());
context.Response = httpResponse;
}
else
{
context.Response = new HttpResponse(404);
}
Here are the necessary attributes for a correct HTTP response:
- A status code: 200 to indicate that the request and response are correct.
- The body: which contains the content of the file.
- The Content-Type: which is the MIME type.
- The Content-Length: the size of the response, which is simply the size of the body.
There you go, our response is ready!
If the MIME type is not recognized, or if the file does not exist, we return a 404 Not Found
response.
Finally, we call the next middleware in the pipeline (although in this case, it doesn't do anything significant).
Between us
Why read files in binary mode? Text mode would work for several types of files, those that you can open with a text editor like HTML, CSS... But have you ever tried to open an image with Notepad? It’s impossible. So, since some resources need to be sent in binary, we might as well do it for all of them!
Still between us
Here, the request path corresponds to a physical file on the disk, but the path could just correspond to what we call an endpoint. In that case, the web service doesn't return a file. Instead, the service can perform an action or construct a dynamic response.
On PandApache3, you can send a GET request to the /echo/ endpoint with a parameter (http://pandapache3/echo/hello). The response will then be "hello", perfect for ensuring your service is running correctly!
Sending the Response
The last step of our journey involves sending the generated response back to the client. This is done in the HandleClientAsync
method:
await ConnectionUtils.SendResponseAsync(client, context.Response);
_clients.TryRemove(clientId, out client);
client.Dispose();
Logger.LogInfo("Client closed");
The SendResponseAsync
method sends the response to the client via the socket:
byte[] msg = Encoding.UTF8.GetBytes(response.ToString());
await client.SendAsync(msg, SocketFlags.None);
if (response.Body != null)
{
response.Body.Position = 0; // Ensure the stream is at the beginning
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = response.Body.Read(buffer, 0, buffer.Length)) > 0)
{
await client.SendAsync(new ArraySegment<byte>(buffer, 0, bytesRead), SocketFlags.None);
}
}
We send the HTTP response to the client in two parts: the first send is for the header, and then the body is sent in packets of 1024 bytes.
Once the response is sent, we remove the client from our connection list and close the connection.
Between us
This method of handling requests and responses is simple, but lacks performance optimization. For example, if your index.html page has 10 images, the server will first receive a request for the HTML page, followed by 10 additional requests for the images. This can be very slow for heavy pages. To solve this problem, we can use multiplexing to send multiple requests and responses over a single TCP connection. This feature is not currently available on PandApache3.
And that's it! We've walked through how PandApache3 manages incoming connections, processes HTTP requests, and generates responses.
In the next article, we will explore more advanced features, including how to handle dynamic content and more complex routing.
Stay tuned!
Thank you so much for exploring the inner workings of PandApache3 with me! Your thoughts and support are crucial in advancing this project. 🚀
Feel free to share your ideas and impressions in the comments below. I look forward to hearing from you!
Follow my adventures on Twitter @pykpyky to stay updated on all the news.
You can also explore the full project on GitHub and join me for live coding sessions on Twitch for exciting and interactive sessions. See you soon behind the screen!
Posted on July 17, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.