The Essential Art of Sustainable Software Architecture
James Eastham
Posted on April 27, 2023
Sustainability consists of fulfilling the needs of current generations without compromising the needs of future generations, while ensuring a balance between economic growth, environmental care and social well-being.
Sustainability; a word we hear an awful lot and one that means different things to different people. What does it mean to be sustainable? The definition above sums it up for me, fulfil the needs of the present without compromising the future.
When we think about sustainability, our minds jump to climate change, making sure we re-use our shopping bags and making sure we recycle our plastic bottles. These are noble causes, as technologists, we have the potential to make a bigger impact.
Software is taking over the world. Every single organisation on the planet is now considering itself a technology company.
This gives us a huge opportunity to have a tremendous impact. Huge portions of the world are using software every single day. Is it not our responsibility to ensure that the software we architect meets the needs of the present, whilst optimising for the future?
What makes up the art of sustainable software architecture, balanced across the environmental, economic and social pillars?
Environmental Software Architecture
The environmental pillar is imperative, and the most obvious when we think about sustainability. To create a more sustainable planet, and aim to mitigate climate change. There is a single word that springs to mind when I think about the environmental impact of our architectures:
Efficiency
The ability to do more with as few resources as possible. The art of having the right resource, in the right place, at just the right time. With some of the recent innovations in software development, efficiency is a practical thing to implement.
Leveraging the cloud for efficiency
A report by S&P Global Market Intelligence discusses the impact of moving workloads to the cloud.
"Simply moving to using green energy is not going to deliver the desired — or the best — carbon reduction. The best place to go from a carbon reduction perspective is the cloud."
The report estimates that a combination of more up-to-date equipment and a better utilised fleet of hardware amounted to an 85% saving in energy usage. This is before considering the fact that many cloud providers are moving their data centres to use 100% renewable energy.
Architect for Efficiency
Many of you have heard of the cloud, and are likely already running some kind of workload in the cloud. But once in the cloud, how can we architect for efficiency? Thankfully, the architecture patterns that are the most efficient also are the most optimal ways to build modern software.
Serverless-first, event driven architectures are a prevalent architectural pattern when building cloud native systems. Remember, one principle we are looking to optimise our systems is to do more with less. A system that scales to zero when it’s not in use, and reacts to events as they happen... What could’ve more optimised than that?
Let's unpack the two terms:
Serverless first: A mindset when building software that defaults to leveraging serverless compute options like AWS Lambda. Shifting the operational burden on to the cloud provider. This enables the cloud provider to be efficient with the underlying compute as they can shift workloads around to meet demand.
Event Driven Architecture: A pattern in which business events drive functionality, coupled with an asynchronous first approach. A system which reacts to events as they happen, and defaults to running things asynchronously, provides ample opportunity to optimise for sustainability.
Of course, a serverless event driven architecture may not always be the most optimal or even the most practical. I'm looking at all you GPU compute heavy folks out there. That said, it's still possible to architect for efficiency.
Defaulting to a single type of compute everywhere in your system is an anti-pattern. Choose the right compute for your use case and optimise accordingly.
If that means running big GPU intensive EC2 instances, then go for it, but consider how you could be more efficient with the compute that you have.
Think asynchronous first, default to using managed services and take intentional step outs away from this pattern where required.
Code for Efficiency
Code is a fundamental part of any software system. There is so much nuance in an individual business use case that hand written, custom code is inevitable. Many of us (me included) default to familiarity. Debating the merits of a language based on syntax, performance and legibility (I’m looking at you Python).
Something I rarely hear mentioned in programming languages, is the environmental impact. A group of extremely smart people out of various universities in Portugal have ran this analysis for us.
The report linked above analyses 27 well-known software languages and monitors the performance using 10 different programming problems, expressed in each of the languages. An interesting observation from this paper is that:
"Compiled languages tend to be, as expected, the fastest and most energy efficient ones. On average, compiled languages consumed 120J to execute the solutions, while for virtual machine and interpreted languages this value was 576J and 2365J, respectively. This tendency can also be observed for execution time, since compiled languages took 5103ms, virtual machine languages took 20623ms, and interpreted languages took 87614ms (on average)."
I find this even more interesting considering how prevalent interpreted languages like Python and NodeJS are in our modern software world.
Unsurprisingly, the top 5 languages for both energy efficiency and execution time:
- C
- Rust
- C++
- Ada
- Java
And the bottom 5 for energy:
- Perl
- Python
- Ruby
- JRuby
- Lua
And execution time:
- Lua
- Python
- Perl
- Ruby
- Typescript See, my looks at Python weren't completely off the mark
Quick side note, Rust has also ranked as the most beloved programming language in the Stack Overflow developer survey for 7 years running (as of the 2022 survey).
Now I'm not suggesting you all go away to learn Rust (although it's a lot of fun) and re-write your entire applications. We will see more on this when we move on to the social pillar of sustainable architecture. But understand the efficiencies of each language is important.
Maybe you don't re-write your entire application, but you re-write the most commonly used pieces of functionality to use a more efficient language. Taking Rust as a specific example, combined with AWS serverless compute, this is also likely to save you money as well.
The Social Pillar of Software Architecture
When I first looked at the pillars of sustainability, the social one made me stop and think. There is the obvious social impact of software; building software to solve the world's hardest problems. But can be dive a little deeper into that, what about the day to day social impacts of the people building the software?
I'm sure many of us have worked on applications where the code was an unruly big ball of complicated mud. Equally, I'm sure some of of us have worked inside code bases that are a perfectly modularised thing of beauty.
Not all of us can work on software that is 'changing the world', but we can all aim to build software that considers the needs of future generations.
Enjoyable to work on
One of the most important characteristics of social software is a code base that is enjoyable to work on. I want to go to work and have fun solving problems, instead of battling with complexity, un-necessary abstractions and hard to read code.
The boy scout rule is an excellent mantra to use here:
Always leave the code better than you found it.
Simple, easy to remember and forces you to think about continuously improving code. Maybe this is as simple as re-naming a variable to be more expressive. Maybe it's adding additional services and functions to make the code more intentional.
A rule of thumb I try to follow (and often fail at) is to write code that reads like a story. If you’re using a compiled language, using long easy to read variable names is irrelevant because the compiler will compress them down.
for (var x = 0; x < counter; x++)
{
var currentIterator = list[x];
}
for (var currentProductIndex = 0; x < maxProducts; currentProductIndex++)
{
var currentProduct = allProducts[currentProductIndex]
}
I know which I prefer.
Reduce cognitive load
Cognitive load is devastating to a software developer. Our brains can only hold so much information in our head at a time. When working on tasks, it functions more like a stick of RAM than an SSD.
Reducing the cognitive load of a system enables your fellow developers to make more informed decisions about functionality. If I can consume a library with a well defined API taht abstracts away a complex task, that reduces my cognitive load. Giving me more mental cycles to focus on the problem at hand.
Defaulting to a modular design, whether that be a modular monolith or microservices, reduces this cognitive burden. Especially when coupled with domain driven design, defined bounded contexts and organising teams around features and not technical domains.
If I'm a developer working on the product microservice, I can become fully embedded in the world of products within the organisation. I need to get some customer information, well the customer team has a well defined API interface with excellent documentation. They even produced an SDK for communication, excellent! Cognitive load removed.
Fast Feedback
Dopamine; the neurochemical responsible for almost all of our modern addictions. That little burst of pleasure you get when you get a new notification... That’s dopamine in action. It’s not all bad though. We can leverage dopamine in positive ways as well.
Fast feedback loops within the application, typically with a well defined set of test cases, help enable this reinforcement. Imagine working within a code base and continuously running the test cases seeing all of them green icons light up. You can feel that dopamine hit already, can’t you?
These fast feedback loops also help enable mastery of the domain and language. A core component of mastering anything is quickly being able to course correct.
Tests light up green = dopamine hit
Tests go red = an opportunity to increase your mastery
Extensible & Observable
The last topic I wanted to touch on around the social impact is that of extensibility and observability. I want to work within a system in which I can easily extend it to add new functionality, and observe/debug it when things go wrong.
Some of the most frustrating times in my career have been through trying to debug a problem that has little to no logs or tracing.
Building with extensibility and observability in mind will impact the needs of future developers, meeting our commitment to the social pillar.
The extensibility angle ties nicely in with our serverless-first, event driven mindset. An event driven system is extensible. If your system behaviour is driven by a set of business events shared between bounded contexts through some kind of central event stream, hooking into that event stream to add new functionality becomes incredibly easy. Coupling that central event stream with a schema registry and you are in a fantastically extensible place.
The observability angle applies to much more than the social component. Observability will help us environmentally and economically (you can’t measure what you can’t improve), as well as socially (understanding what the heck is going on in a system).
The mindset of the sustainable architect is to be serverless-first, asynchronous and event driven by nature with observability as a fundamental tenant.
The Economic Pillar of Sustainable Architecture
If the social pillar made me pause for thought, this one had me stumped for a little whilse. The definition of the economic pillar is:
This pillar is based on companies’ ability to contribute to economic development and growth.
As architects and engineers, we have little say in our organisations’ wider policies on contributing to economic development and growth. So what does economic development and growth mean within the context of software?
Within this pillar, I see the practical patterns & steps you can take as an architect to contribute to the development and growth of your software system sustainably. There are many ties with the environmental pillar here, with efficiency being a fundamental component. So let's consider the economics of the different system components.
Compute
We’ve discussed compute already, defaulting to a serverless-first mindset. But there are several additional patterns and practices to help enable sustainable use of compute.
The first is to optimise for stateless compute. Aim for your application layer to hold little to no state. It helps to enable the event driven, reactive compute we are looking for. If your application requires in-memory state, then you restrict the type of compute you can use, as well as how you and your cloud provider can shift that compute around.
Holding an element of data in memory may be useful from a performance perspective, caching for example, but you should not rely on it.
Second, and related to the earlier point, optimise the actual compute itself. This may mean using more optimised chipsets (x86 vs ARM). Or it may mean optimising the type of compute you have. If you’re within the serverless world, this may mean optimising the memory allocation of your function to get the optimal performance vs memory balance. If you're in the physical server world, this may use the most optimal instance type (compute vs memory vs network optimised) for your specific use case.
There is not a one size fits all in compute, it always depends.
Networking / Data Transfer
The environmental impact of networking and data transfer is difficult to quantify, as there are so many variables. That doesn’t mean we can’t be more economical though.
Ask yourself the question, what is the minimal viable data I need to perform the task I have? The backend for front-end (BFF) pattern is excellent for this, alongside technologies like GraphQL. The aim here - ensure only useful data is transferred.
Compression is another decision point that we will see again when we discuss storage. Many workloads, including almost every single one I have built, default to JSON as the data transfer medium. Understandably so. It’s human readable and relatively succinct. But actually, as we are passing data over the wire, can we not make this even more efficient?
Protobuf and MessagePack are two such technologies specifically targeting efficient binary serialization, whilst still maintaining an element of schema and operability.
Caching and data transfer distance are two concepts that go hand in hand. That is using content delivery networks (CDN) to get the data and compute as close to your users as possible. Caching data at the end, as well as running compute at the edge, reduces the network ‘cost’ of transferring the data.
Caching is also valuable within our backend systems. If you are working within a micro-services architecture, it’s likely you have an element of communication across a network. Consider if you could cache external data within your own service to reduce the network transfer cost and secondarily reducing the compute load on your other systems (a big tick in the social pillar here as well).
Micro-services architectures, coupled with event driven architecture, also bring up an important point of debate around event schema design. There is a probably a separate blog post within this topic, but determining what you include in your event is important. Events that are too sparse result in many calls back to the producer service to retrieve additional data. A ‘fat’ event results in a larger initial cost as it distributes the event to all the subscribers.
That said, if we truly optimising for sustainability, my preference is for a fat event with a combination of JSON & Binary. We compress the main body of the payload into a binary format using a schema-based compression like Protobuf, with additional metadata in readable JSON. That strikes the right balance between a well-known schema, reduced network calls and an optimally sized payload as well as providing the ability for subscribers to filter data.
{
"metadata": {
"eventType": "orderCreated"
},
"filters": {
"orderValue": 256.82,
"geo": "EN",
"customerAccountId": "123456",
"deliveryType": "PREMIUM"
},
"payload": "<BINARY_DATA_GOES_HERE>"
}
Storage & Analytics
Many of the concepts around compression discussed above apply equally to storage. The most important architectural consideration here is to understand your data access patterns. Regardless of the storage technology you are using, understanding how the data is going to be accessed is vital in understanding how you can optimise.
For example, let’s consider the use of DynamoDB as a primary data store. Used in the most efficient way, access to DynamoDB uses a combination of partition and sort keys. Any other data stored against an item isn’t available to query.
In this scenario, leverage technologies like Protobuf to serialize the main bulk record to binary and store that in a more efficient way. An additional benefit here, you’ll probably reduce your costs as well.
Considering your access patterns will also enable you to optimise the performance of your queries, having the knock on effect of optimising your compute. It’s a wonderful sustainability flywheel we are getting ourselves into, isn’t it?
The other consideration around the data itself is what you store and how long you store it for. Continuously ask yourself the question about the data being stored and if all of it is necessary. And if it is, how long do you need to keep it around for?
And finally, think about your recovery point objective and recovery time objective. Do you really need to be taking full database backups of your product review database every 5 minutes and storing them for up to 3 years? Probably not. Much like the compute option, fit the technology to the use case. There is no such thing as a one size fits all storage solution.
Now some of these ideas become less useful when considering analytics. A nicely compressed piece of binary data is excellent for our storage costs and retrieval efficiency, not so useful for running custom analytics queries. How can we be pro-active about our analytics workloads?
Unlike the compute recommendation of being reactive, only running applications when required, for analytics I would suggest being much more pro-active. Specifically, transform your data into a format that is suited for custom queries. Extract, transform, load (ETL) jobs have been around for a long time. One of my first jobs was developing ETL jobs. Using these pro-active mechanisms for pre-formatting data ensures that when these complex, custom queries come in, they are working in data sorted in the most efficient way.
If you combine this with scheduling your ETL jobs to run during low-carbon intensive periods, you can really push up the sustainability of your analytics workloads.
Charachterize. Observe. Improve
I wanted to close out with a very specific section on architectural thinking. All architectures comprise the architectural ‘-ilities’. These ilities are often at odds with each other and are at the core of the often heard phrase ‘it depends’.
- Performance
- Scalability
- Availability
- Reliability
- Security
- Maintainability
- Flexibility
- Configurability
- Personalizability
- Usability
- Portability
- Conformance to standards
- Efficiency
- Responsiveness
- Interoperability
- Upgradability
- Auditability
- Transactionality
- Adminstrability
- Sustainability (my addition)
For example, it's difficult to combine performance and security. Increasing security typically means slower performance. If we consider sustainability to be a non-negotiable, that naturally makes some of the other ‘-ilities’ fall become clear as well. This is very much my opinion, but these are what I would consider the most important ilities for sustainability.
- Scalability
- Security (it's job zero, it makes it on to every list)
- Flexibility
- Usability
- Efficiency
- Interoperability
- Upgradability
- Auditability
Once we have defined the charachterisitics we want our system to have, we can use fitness functions to ensure we have continual feedback for conformance to the core tenets of our system.
For example, if scalability and efficiency are our core tenets to enable sustainability, we may write fitness functions to:
- Monitor resource usage and how optimally we are consuming the resources we have allocated to a task
- Monitor how often we are using compute when there is no work to be done
- Ensure no individual piece of compute runs for longer than 100ms, and uses more than 128mb of memory
- Monitor scaling behaviour, and how quickly our system adapts to demand
Fitness functions should run both as triggers, as part of CICD, and continually as part of ongoing system monitoring.
For all of that to work, observability is another non-negotiable '-ility' in all of our sustainable architectures. Generating trace and metrics data to understand our systems enables us to be data driven about how we optimise and gives our fitness functions something to work with.
Wrapping Up
If you’ve stuck with it this far, thanks for reading. If I’ve lost you, well, you’ve probably not got this far anyway, so this sentence is becoming irrelevant.
Frankly, this post has been a bit of a brain dump. A coming together of my technical skills and my values to want to inspire change in the world.
I truly believe that as software practitioners, we have immense potential to bring about genuine change in the world.
Let that carry into everything you do.
James
Posted on April 27, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.