The missing language

The reason why code dies so soon is the lack of human readability somewhere in the middle between understanding the business and making it compile. Even though code is linguistically very constrained (the compiler will unapologetically force rules onto you), it should still be considered as text with intent to communicate on a human level as well. We’re trying to use normal written language throughout the code to allow our colleagues to understand what we mean. It’s here that we often find that the code is lacking. This article provides an insight into what we are trying to do, as well as giving two approaches to increase the clarity of your code.

The programming language: The toddler

We often talk about “high level” and “low level” languages, as if one is smarter than the other. But in reality most of them are equally bad at giving you a language that allows you to easily communicate complex concepts. Using the basic syntax of any programming language is like trying to explain things at the language level of a toddler.

The language capabilities of a toddler are limited. They can explain basic concepts in small sentences. They will have a hard time articulating complex subjects though, as 3-word sentences with limited vocabulary just are not enough. Sure, it’s possible to make complicated things with just this basic syntax, but it’s extremely hard to communicate what you are doing if you are only using that language’s native datatypes and -structures.

In general, we should consider ourselves lucky, as we’ve come a long way. We’re working with relatively rich languages. In assembly, providing context and guidance through code was virtually impossible with only a handful of words available.

Conveniently, a computer is easily able to understand the programming language and act on it.

<-- primitive <--                                    --> advanced -->
-----------------------+
Programming language   |
-----------------------+

The business language: The expert

The business language has the opposite problem. It’s highly complex and very rich in language. A lot of words are specific to a specific business, some might even be shortened into acronyms. Even though it is so rich, it might even be lacking in how well it can explain the actual business process. Writing down a complex process in a flat text is hard. Often it will be enriched with diagrams, charts, oral presentations and explanations.

Sadly, a computer has no chance interpreting this and acting on it.

<-- primitive                                            advanced -->

-----------------------+                   +-------------------------
Programming language   |                   | Business domain language
-----------------------+                   +-------------------------

The codebase language: Yours to shape

Here comes the hard part… Have you tried explaining a complex business process with limited vocabulary, 3-word-sentences like a toddler? Toddlers are hilariously bad at articulating more complex things. Often requiring adults to guess the context and meaning. So what can we do to fix this problem?

<-- primitive                                            advanced -->

-----------------------+                   +-------------------------
Programming language   |        You        | Business domain language
-----------------------+                   +-------------------------

There are 2 approaches to shaping your language: top down and bottom up. Both are valid and relevant and can be used at different points in a codebase lifetime. In either case, acquiring enough business knowledge will be a damper on your speed until you figure it out well enough to gain significant speed. The goal is to have your code flexible enough so refactors are quick and easily contained when you gain new insights.

Bottom-Up: Contain your toddler

This is perhaps best known as the extract X refactor, where X can be method, variable, field, class etc. They group several lines of code into one named entity, which is more descriptive and quicker to understand than reading the details. Essentially using toddler language to provide the details, but using a slightly more advanced language to describe it as a wrapper.

The big disadvantage of this approach is that this new language is often not aligned with the business domain, as it’s very hard to see from the details what this new group means for the business. The only accurate information you have available is what it does on a technical level.

For example: function lookupDns(name: String): IpAddress will look like function lookupNumber(name:String): Number. If you haven’t figured out you’re making a DNS lookup. Yes, the latter adds a more comprehensive name over just handling the datastructures, but it’s hiding its true business nature: It’s a DNS lookup.

This is really a 2-step approach, first you create order in the chaos by figuring out what it technically does. Once you’ve reached zoomed out on the details enough to understand what the program does, you go back and rename everything to make sense on a business level and not just a technical one.

Caution: This approach tends to fail dramatically if it’s being used for creating abstractions! It’s impossible to judge from the details which abstractions you should make. Code duplication is not a good reason for creating an abstraction, these pieces of code might have a wildly different meaning in the business, will diverge or want to evolve separate from each other. The same goes for over-diligent application of design patterns in this approach. The wrong abstraction causes more chaos than it solves.

What it is good at, is reducing the mental space required to figure out the business domain. Both when doing your initial “messy” implementation, or when exploring an existing codebase that perhaps isn’t as clear as you’d like.

Top-Down: Explain the expert

This approach is the exact opposite of what I explained above. Instead of figuring out the business from the details, you start from business domain language and then figure out what lies beneath.

It’s probably easiest with a silly example:

We make cookies

This is the highest level explanation of a business. The next step is to figure out the details of that statement:

We make dough
we bake the dough
we package the cookies

In code that might look like this:

function makeCookies(): PackagedCookies {
    val dough = makeDough()
    val bakedCookies = bake(dough)
    val packagedCookies = package(bakedCookies)
}

So your next task is to figure out the details for these 3 new methods. A larger example can be found here

Major advantages of this strategy is that you learn the business domain in a more structured way, and often give it correct names from the start. It’s a lot easier to find which places require an abstraction too. They will present themselves as your business stakeholder giving similar explanations, or different options for the same step in the explanation, etc.

These emergent abstractions are perfectly suited to the tools offered by recent programming languages: polymorphism, recursion, loops, design patterns, etc. These tools allow you to create a structure that is easy to navigate, which is an issue in flat text as offered as an instruction manual for example: The recipe for different cookies might be 4 pages apart, presented one after the other, as all the details of a recipe are in the way of communicating which and how many recipes there are.

This method also has some disadvantages. If you’re implementing a new feature in the middle an existing codebase, it might be difficult to actually gain enough information to give proper names and fill in the details, forcing you to refactor later on to give proper names and structures. This happens with the other approach too, but in this case is usually a bit more complex to solve. However, this disadvantage completely dwarfs the benefits of a refactor brings in terms of clarity, ease of navigation and code longevity throughout its lifetime.

Progressive insights

It’s best to assume that your knowledge of the business knowledge will not be complete from the start! Often your stakeholders have studied years for this, so it’s not expected of you to know their job inside and out in one week.

This means though that your code will also suffer from this knowledge gap, and that you will figure out what names and structures you should use, further down the line and not at the start of the project. This progressive insight makes you better at your job but also makes you responsible for refactoring your code to suit these insights. You should never postpone this, or you’ll never do it. Not doing it will almost certainly push you off the cliff into slow death sooner than later, instead of slowly but steadily ascending closer and closer to the perfect codebase

Summary

The end goal is to have the correct semantic layering that is succinct, easily navigable and easily changed as well as doing its job properly. It’s likely you’ll use both approaches to reach that destination. Switching from one to the other as you try to discover the business domain and how is best translated into something a computer will understand.