Build Gunicorn from Scratch

Many python developers use gunicorn to deploy their python applications, it is one of the most used python WSGI servers. In order to understand better how these servers works under the hood, I decided to create one on my own!

The server have the next main features:

Clean code: The project has educational purpose, in order to show how the wsgi specification works under the hood, the code is as explicit as possible, the code is decoupled in three different layers: Network Server, WSGI (wsgi specification), http2(domain entities)
It has to work!: The server has to be able to receive income requests, load the python application and return a valid response.

This project is not production ready, and it could contains bugs. Also, the project doesn't fulfill all the use cases and all the possible corner cases, it is not safe deploying our applications using it.

Funicorn

I decided to create Funicorn as the not production ready version of Gunicorn.

Overall Architecture

The architecture of the project is pretty simple, it contains three components:

Network components (Server and Request Handler): which handle the low level interaction with the OS.
WSGI Layer: It follows the wsgi specification, and it is able to talk to the python application.
Python application: It contains the business logic you want to perform with you application, it usually use a framework such as Flask or Django

Main Application

The main application is our server entry point, it receives all the necessary configuration to handle properly your application. Also, it is the place to handle the life of your server.

The funicorn class only receives the essential information to work properly. It passes the application_path parameters to the handler, and wrap the server features.

# funicorn/main.py
class Funicorn:

    def __init__(self, app_path: str, app_obj: str, host: str, port: int):
        FunicornHandler.app_path = (app_path, app_obj)

        self.server = TCPServer((host, port), FunicornHandler)

    def run(self):
        self.server.serve_forever()

    def stop(self):
        self.server.shutdown()
        self.server.server_close()

Server

The network server handle the low level network requirements. It establishes, keeps and finishes the network connection successfully, such as opening the network socket, binding the socket to the specific host and port, it also channel the information to the handler.

In order to prevent me from building the server from scratch and handling myself the network connection, I used the TCPServer from the socketserver python module, this module provides a few server classes with the basic features you would need.

Request Handler

The Funicorn Handler handle appropriately every request the server receive. it loads the python application, processes the data received by the server, and return the appropriate response.

My handler inherit from BaseRequestHandler which is a base class provided by the socket library of python.

# funicorn/main.py
class FunicornHandler(BaseRequestHandler):
    app_path: Tuple[str, str] = None

    def handle(self):
        application = load_application(self.app_path[0], self.app_path[1])

        http_request = self.request.recv(1024)

        request: Request = RequestDTO(http_request).request

        response: Response = application.process_request(request)

        response_data = ResponseDTO(response).http_data
        self.request.sendall(response_data)

Request and Response

In order to work easily with HTTP and network entities, I created a Request and Response classes, with all the necessary information.

The classes only isolate the request and response information such as headers, the method used and the body.

The body must be a file-like object, it is passed to the application and it must implement the method read, and write at least. I have modeled it using BytesIO.

# funicorn/http2.py
class Request:

    def __init__(self, method: str, uri: str, headers: Dict[str, Header], body: Body, query_param: Optional[str],
                 server_name: str, ssl=False, protocol: str='HTTP/1.1'):
        self.method = method
        self.headers = headers
        self.body = body
        self.uri = uri
        self.server_name = server_name
        self.query_param = query_param or ''
        self.scheme = 'https' if ssl else 'http'
        self.protocol = protocol

class Response:

    def __init__(self, status: str, headers: List[Header], body: Body, protocol: str):
        self.status = status
        self.headers = headers
        self.body = body
        self.protocol = protocol

class Body:

    def __init__(self, data: bytes = None):
        self._data = io.BytesIO(data) if data else io.BytesIO()

    def read(self):
        return self._data.getvalue()

    def write(self, data: bytes):
        return self._data.write(data)

    def __len__(self):
        return len(self._data.getvalue())

WSGI

The WSGI Application entity is the layer which wrap the python application and WSGI specification for the rest of the server. It receives a request entity, passes it to the python application and returns the processed response.

The main method is process_request , where the request is processed, and passed to the application through the environment variables. Finally, the response is built through the start_response method (defined in the specification), and returned to the Handler.

# funicorn/wsgi.py
class WSGIApplication:

    def __init__(self, application: Callable):
        self.application = application
        self._request: Request = None

        self._response_builder = ResponseBuilder()

    def process_request(self, request: Request) -> Response:
        self.request = request

        environ = WSGIEnvironment(request).environ

        http_body = self.application(environ, self.start_response)

        self.set_response_body(http_body)
        response: Response = self._response_builder.build()

        return response

    def start_response(self, status, response_headers: List[Tuple[str, str]], exc_info=None):
        if not exc_info:
            self.set_response_metadata(status, response_headers)
        else:
            raise exc_info[1].with_traceback(exc_info[2])

All the information provided by the request, it is wrapped in the environment variables in the following way.

There are many environment variables you need to provide to the Python Application In order to work correctly, there variables are defined in the python proposals, and much of them are mandatory.

Also, many of these variables are defined by the HTTP Protocol, each one provides important information to the application. I have only implemented the mandatory variables.

I won't explain these variables in more detail, because I consider there is already enough information about it on the web.

# funicorn/wsgi.py
class WSGIEnvironment:

    def set_cgi_environ(self):
        self.environ['REQUEST_METHOD'] = self._request.method
        self.environ['SCRIPT_NAME'] = ''
        self.environ['PATH_INFO'] = self._request.uri
        self.environ['QUERY_STRING'] = self._request.query_param
        self.environ['SERVER_NAME'] = self._request.server_name
        self.environ['SERVER_PROTOCOL'] = self._request.protocol
        self.environ['SERVER_PORT'] = '8000'
        self.environ['CONTENT_TYPE'] = self._request.headers.pop('CONTENT_TYPE', Header('', '')).value
        self.environ['CONTENT_LENGTH'] = self._request.headers.pop('CONTENT_LENGTH', Header('', '')).value
        self.environ.update({f'HTTP_{header.key}': header.value for header in self._request.headers.values()})

    def set_wsgi_environ(self):
        self.environ['wsgi.input'] = self._request.body
        self.environ['wsgi.input'] = self._request.body
        self.environ['wsgi.errors'] = sys.stderr
        self.environ['wsgi.version'] = (1, 0)
        self.environ['wsgi.multithread'] = False
        self.environ['wsgi.input_terminated'] = True
        self.environ['wsgi.url_scheme'] = self._request.scheme
        self.environ['wsgi.run_once'] = True

Python Application

The python application executed by our server is a simple Flask example. The application is able to receive GET and POST request.

GET: The application return "Hello World"
POST: The application return the same body from the request (an echo application)

# funicorn/examples/flask_example.py
app = Flask(__name__)

@app.route("/hola", methods=['GET', 'POST'])
def hello():
    if request.method == 'POST':
        return request.json
    return "Hello World"

Conclusion

Implementing this project was a great exercise because it forces you to understand many details you didn't know about some of the standard most used by the python community, and how it actually works under the hood, it is not magic, it is actually code!

I would recommend to anybody take a look to the respository where you can see the whole code, in order to take a better look to many details I was not able to summary here. You can clone it and play with it, as I commented before I tried to be as explicit as possible, but if you have any doubt let me know :)

Also, let me know in the comment section, if you liked the article and it helps you to understand how this kind of projects works.