Refactor for better knowledge allocation
Fran Iglesias
Posted on May 31, 2021
Refactor for better knowledge allocation
One of the ongoing problems inside all organizations is documentation. From an agile perspective, code can document a good part of the knowledge about the business when written in an expressive, well organized, fashion.
But this frequently does not happen due to several reasons. Poor communication, too much emphasis in implementation details, framework driven development, among others. So, many code bases present problems when you try to understand how they express the domain of the business.
In our previous post we talked about some tips to refactor code to be a better storyteller. This time we will continue digging into the same idea, but from a slightly different point of view: how to better structure the knowledge.
Knowledge means a lot of things. In every software project there are both business knowledge (how to book a visit with a doctor, for example), but also technical knowledge (how to connect to some database server). The problems arise when different domains of knowledge appear mixed in code. It can be technical concerns entangled with business concerns, different areas of the business sharing pieces of code or huge levels of coupling between different parts of the code.
Object-Oriented Software… or not so much
There are lots of code bases in the wild written in object oriented languages, using classes and objects, that doesn’t really apply well the OO paradigm. They consist of procedural code with an object-oriented costume. In this code bases, knowledge is something global and objects are usually only containers for data, behavior or both, but they are not really self-consistent, well encapsulated objects.
In the OO paradigm, objects are experts that encapsulate knowledge and behavior, communicating and cooperating to fulfill the tasks.
The most difficult problem in teaching object- oriented programming is getting the learner to give up the global knowledge of control that is possible with procedural programs, and rely on the local knowledge of objects to accomplish their tasks. Novice designs are littered with regressions to global thinking: gratuitous global variables, unnecessary pointers, and inappropriate reliance on the implementation of other objects. (Beck & Cunningham: A Laboratory For Teaching Object-oriented Thinking)
Object-oriented design is guided by a plethora of principles that serve as guidelines to make decisions about how to allocate knowledge and how objects should interact. We can benefit from applying this principles to improve the quality of our code, moving from a procedural style to a more object-oriented one.
So, in this post we will review some of these principles and we will try to show them in action.
Separation of concerns principle
This is pretty simple, and the basement for most architectural and code organization decisions: different parts of the code addresses different concerns.
We can understand this principle from another perspective: a unit of code should not address more than one concern. If so, it needs to be broken into parts. We can apply this to different levels: the methods in a class, the different class in a software module, the different modules in a software application, or the different layers.
This basic principle, enunciated by Dijsktra, is in the roots of the well know Single Responsibility Principle, the S in SOLID, and most of the patterns and principles that help us to put knowledge where it belongs.
Single Responsibility Principle states that software units should have one unique responsibility, defined as a unique reason to change. It doesn’t mean to do only one thing, that could lead us to a convoluted design. Instead of that, this principle is better applied if we consider the reasons that could force us to change that unit. Let’s see an example.
Price is a common concept for most business. It could be modelled initially with this class:
Pretty anemic, huh? But it is a starting point. Now, the Finance team asks for a way to add VAT to the price. We honor immutability and add a factory method that returns the price with added VAT. Well done!
Also, the Front-End crew asks for a price representation that includes the currency information. We add a simple format method to the class.
Hey! Different currencies requires different representation formats. Also, our beloved Marketing team asks for a feature to add discounts and promotions to our Price objects. Wait a moment! This poor little class has a lot of reasons to change. We are asking it to hold a lot of responsibilities.
We need a change to address that.
First, Price should be only responsible for holding information about… price. We can model Taxes and Discounts as Price decorators, so we always will be able to recover the base price. Here is an example. It’s not perfect, but now we have separated some of the responsibilities.
The method addVat in Price will become unnecessary but if it is being used we can remove it iteratively.
So, do you need a discount? No problem:
Now, you can use it like in the following test. Price is now a compounded object:
Now, if the Finance team asks us to apply another Tax to our price, or one different because we need prices for several countries with different laws, we only need to add classes accordingly without touching Price itself.
Also, if Marketing asks us to apply new discounts and promotions, we only need to add classes for each one.
All of those new classes may change for one unique reason, so we are honoring the Single Responsibility Principle… at least in this area. Remember that we need to address the format concern. But we can apply a similar solution.
Oh, and another benefit is that by doing this approach, we also are honoring the Open for extension, closed for modification principle. This principle states that we should avoid to modify existing code in order to add or modify behavior. Instead, we should provide means to extend object behavior without touching that existing code.
Let’s return to the format problem. It exposes an interesting problem that requires us to talk about segregating interfaces.
Interface Segregation Principle
If you look at our Decorators you can see that they have to carry the format method, and they really don’t need it. Price and their Decorators are having two reasons for change: one related to their business meaning, and one related to their presentation. Also, you have to duplicate it in every type you need to create, multiplying the problems with the format variations.
When working with legacy objects, it is very easy to find classes that hold too many responsibilities and had huge public interfaces with lots of methods serving different kinds of consumers.
The Interface Segregation Principle, states that an object should not depend on methods that it will not use. We should design narrower interfaces based on the needs of its consumers.
In our example, PriceInterface has methods that obey to different kinds of consumers: one is interested in the amount concerns, and one is interested in the presentation concerns. So, we should separate those concerns in different interfaces.
Now, the old Price class implements two interfaces, but our decorators only implement one of them: the one that relates with amount modifications. They are free about presentation concerns.
We can solve these concerns with another family of decorators using the PriceFormatterInterface:
Now, you can compose them:
It looks nice. And it is because we now have separated responsibilities in different classes, with pretty narrow interfaces.
An extra benefit is that we removed some duplication. Let’s talk about being DRY.
The DRY principle
The Don’t Repeat Yourself principle, by Hunt and Thomas, states that “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”.
The principle doesn’t refer to code duplication, it talks about knowledge duplication. This is an important distinction, because trying to apply DRY to any code duplication can lead us to tough problems. Sandi Metz has a post that worth the reading: The wrong abstraction.
Anyway, code duplication can be a symptom of knowledge duplication, it could mean that there is an interesting concept emerging. But you should consider this carefully. Code duplication can:
Be unavoidable: three or more fragments of code look similar, but they don’t represent the same knowledge.
Be unnecessary: three or more fragments of code can be reconciled extracting the parts that are different for economy. For example, blocks of code that can be extracted to one method or function.
Represent knowledge duplication: three or more fragments of code are examples of a more general concept or abstraction.
Have you notice the three? There is an heuristic called The Rule of Three that advises to not refactor duplication unless you have three or more occurrences of the same code. It is a simple rule to prevent premature abstraction if we don’t have enough information to decide what kind of duplication we are spotting.
But, is there a similar rule to find knowledge duplication that we should refactor? Yes. If you need to make the very same change in several parts of the code at once, in order to introduce or modify a behavior, then you have a candidate.
I will try to show you some examples of when to apply DRY and when don’t with our Price Decorators family.
Does it make sense that both decorators share a common ancestor in order to avoid duplications? No, it doesn’t. Taxes and Discounts are two different concepts, two different pieces of knowledge with their own business rules and they are also managed by different teams in the company. Instead, we could model Taxes and Discounts as different abstractions.
The tell, don’t ask principle
The basic idea is that if you are getting information from an object to operate with it, you should encapsulate the whole operation in that object because it is where it belongs. Instead of asking the object about the data needed for the calculation, it is better to tell the object to calculate that itself.
In a more formal definition: if you need to ask an object about its state in order to change the state itself, you are violating both the information hiding and encapsulation principles. Then, you should encapsulate the operation in a method.
This principle is very useful to start moving knowledge to the objects to which belongs. Objects should be information experts about themselves. Also, objects are the solely responsibles about its state and consistency. You should be able to trust your objects in that concern. This will make your life easier.
Let’s see an example. Imagine you have a concept TimeSlot to allocate, ahem…, time slots in a Calendar application. If you want to know if two slots overlap you could do this beauty:
You are asking time slots about data to perform a calculation in order to guess something about the state of both slots, among other code quality violations. But this is a knowledge that belongs to the TimeSlot object itself: to know if it overlaps with another TimeSlot:
So you can tell a TimeSlot to calculate if another one is overlapping or not, because it has all the knowledge about itself.
Anemic object smell
This principle is related with the anemic object smell. An anemic object, or more specifically an anemic model, is an object that has only state, but no behavior. There are several objects that are designed to only contain data, for example Data Transfer Objects, but we are referring here to objects that model entities or values that should also have domain behavior.
In our example, the first implementation of TimeSlot was an anemic object. It only had properties and accessors to those properties.
As we have shown above, TimeSlot can, and must, encapsulate its own knowledge in the form of properties and behavior. Rich objects also can attract more behavior needed by the application. Imagine that you need a way to get a new TimeSlot just exactly after a given one. Here is the dirty way:
But, TimeSlot could be able to create them for us:
Because it knows how:
This way, you know that if you need to learn about time slots, your first stop should be the TimeSlot class.
The minimum knowledge principle or Demeter’s law
This principle states that a unit of software, usually a function or method, should not talk with objects that they don’t know directly. What objects are those?
Objects instantiated inside the unit.
Object passed to the unit as parameters.
The object that owns the unit (its methods and attributes)
The application of this principle helps a lot to avoid coupling. Coupling is the degree of dependency between software units. Some level of coupling is unavoidable but you can keep it under control. The secret for a healthy coupling is that objects know the minimum about other objects.
An object should not rely in having intimate knowledge about the internal structure of another. For example, an object should not perform calls on an object inside another one. This is a smell called inappropriate intimacy.
You should talk with an object using only its public interface. If you really need to access to some property or internal, you should consider to add a method that exposes it. But also, you should ask yourself if that knowledge should be available in another object.
Consider this piece of code. We have a PricingCalculator that allows us to calculate product pricing applying different rules. This is a new way to model some of the behaviors of our previous examples.
As you can see, base price of Product is defined in its product Family. Let’s see what happen inside the calculator:
This works, but PricingCalculator has to know that, in order to get the product price, it has to ask first for the Family and then for the price. Now, that’s tight coupling: PricingCalculator must know intimate details of the internal structure of Product. Nevertheless, It shouldn’t need to know how the price is built or where it come from. It only should know that Product has a price.
In the future, if you change Product to have its price defined in a different way, the program could break. For example, imagine that Family is no longer responsible of the Product price:
If we don’t change PricingCalculator, the program will fail, because it expects to find the price in the Family object. It depends on talking with an object of which it doesn’t know about.
In this case, Product should be the source of truth about its price:
Now, PricingCalculator only depends on Price, a known object because it is passed as a parameter in the method forProduct.
In this case, applying Demeter’s Law helps us to decouple and to make our code more resilient to changes, allowing objects to change its internals without affecting others unnecessarily. So, every time you find those chained calls for access objects contained in another object, take a time to encapsulate that logic or event consider if that knowledge belongs to another place.
More refactoring tips coming
We expect that with this post you have some food for thought about how to allocate knowledge and behavior in your code.
Object-oriented design principles are a very good guide to help us to move and organize concepts in our codebase, allowing for a better modelling of the domain knowledge. We didn’t address all of them, of course. The lesson here is to take them into consideration when refactoring.
There are a lot of things about refactoring, so stay tuned to learn about managing complex and nested conditionals and more ways to improve the health of your code.
Posted on May 31, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024