I’m a perfectionist when it comes to my art. You may not think of something like writing software to be an art, but like anything that takes time and effort to master, there are folks like me who want to do it as well as possible. I don’t simply write the code and stop when it works. As I’m writing the code, I’m constantly thinking and evaluating about whether how I’m tackling a problem is the right approach. I go back to code I wrote previously and continually try to improve it if I can do so quickly.
Why do I do this? I have been writing software for over twenty years. In that time, I’ve seen lots of projects grow from infancy all the way up to being in production for years. Although we dream of writing software that will last forever, actually keeping software running and bug free while adding new features is incredibly difficult. This is especially true if you aren’t actively writing tests as you are rolling out those features.
Developers usually have a feeling somewhere along the way that it might be time to go back and clean up some of the technical debt that has been slowly building up over time. By technical debt, I’m talking about all those TODO and FIXME notes that we have scattered throughout the code base. I’m talking about the components that keep having defects every time a change is made to them. I’m talking about the hacks we put in place to get the features out the door.
One reason developers are so emphatic about refactors is that they feel the pain of the technical debt more than anyone else. The user experience may not change all that much over time, but the amount of effort required to add new features slowly grows, and a tension builds between the development team and management as deadlines start getting missed and the overall quality of the product drops.
Should We Refactor?
When that pain gets bad enough, you will hear the development team start pushing back on new feature requests, insisting that the feature will require a refactor of the application. If it gets bad enough, they may even start recommending scrapping what exists and starting over. This is not an easy sell on the business, especially when there are features planned and timelines on when those features are expected to drop.
Making the case for refactoring is difficult. Often the reasons for the refactor are highly technical, and not really easy to translate into a business value proposition. There is a significant productivity impact when a team spends time doing a refactor, and it can be hard to justify a refactor on a product that is actively driving business value. It generally comes down to monetizing the cost of continuing to maintain things as they are (although it may continue to take longer to add additional features) versus monetizing the cost of doing the refactor (which should make it possible to deliver new features more quickly).
Refactoring code is a risk. Business leaders must always take risk into account in every decision they make. Making a change may bring some huge benefits, but what if some significant risks are also introduced? What if the team fails to complete the refactor? What if they do complete it but it doesn’t really improve the speed of delivering features? What if the customer experience changes in a negative way? Is the risk really worth the benefit? These are the questions that must be answered.
Monolith versus Microservices
Another reason developers call for refactoring is to modernize the application infrastructure. A new trend in software engineering is to break large solutions up into lots of smaller but more maintainable microservices. Why has it become so important now to make this change when there are applications that have been running on mainframes for decades? The answer is pretty simple: the cloud.
There’s nothing all that magical about the cloud — it’s just server infrastructure that someone else is running and maintaining for you. The big win on cloud is that you can provision servers in mere minutes instead of waiting for someone to buy servers, install them, flash them with the right software, etc. Large companies often have huge delays in their processes that are simply due to getting their infrastructure in place.
The cloud actually has been with us for a long time, but recently it has gotten much more efficient with the virtual machine and container platforms that allow us to finely tune the resources allocated to particular tasks. No longer do we need to ask for a dedicated server with the required resources — we simply request an instance and it magically appears for us. Need more capacity in a pinch? Simply spin up another server and balance the loads. Stop them when you no longer need them.
Actually taking advantage of this magic cloud, however, often requires a change in how engineers write and publish software. It is not as simple as taking that application running on the mainframe and dropping it on those cloud servers. There is often still a lot of work required to install software on the servers, configure network connections, provision user management, etc. Bringing this to an enterprise level can take years.
So how does microservices fix this? By allowing us to slowly migrate small pieces of the application over to the cloud. I was recently at a DockerCon conference when a company called Solo.io introduced a product called Gloo. Gloo leverages another technology called Istio that allows you to create a service mesh — that is, a framework that allows you to more easily control how parts of an application communicate with one another.
What Gloo brings to this is a gateway that allows you to selectively direct API requests to different backends based on things like URL paths. A path like /app/customers/ could be serviced by one group of servers and another path like /app/products/ could be services by another group of servers. This will allow you to start to breaking up your application into pieces slowly over time.
Your migration plan transforms from trying to move the entire application all at once to simply picking and choosing things to migrate and leaving the rest running. You pick one small service, implement it in a new place, such as on AWS, and when it is ready, you simply update your proxy gateway to begin directing traffic to the new endpoint. If things go wrong, you simply update the proxy to fall back to the original endpoint.
The biggest risk of any infrastructure migration is the risk that things will go wrong and you will need to roll back. Years ago, we would have taken down our service for hours while new software was installed, only returning service when the process is completed and all the smoke testing has ensured that everything is up and running. If something did go wrong, it meant more downtime attempting to resolve the problem or restoring the original configuration.
Those days are coming to an end with this microservices approach. You no longer need to bring down anything while you do the work of bringing up the new services. Once the new services are up and running, the proxy change is a single update. These proxies also allow us to direct only some of the traffic to the new endpoints while keeping the rest of the traffic going to the old. We can easily reduce the number of customers that could be affected by any defects introduced.
So should you refactor your application? It isn’t a simple yes/no question. There are risks to evaluate and costs that need to be estimated so that the business has all the information to make a good business decision. With the introduction of microservices, however, the risks and costs of doing a refactor are going down, and it is becoming much easier to be able to modernize an application without exposing the business to unnecessary risks.