It Depends #29: On self-documenting code

Hello everyone!

Apologies to my regular readers for the 2 weeks break. I had travelled to my parents’ place in Delhi and there got involved in a false-alarm COVID episode. All is well, and I welcome you to the 29th edition of It Depends. Today I will talk about self-documenting code (you can read the article on the website if you’d like), followed by the usual awesome from the internet.

I love documenting code and systems. Many don't. A major argument against documents is that they get outdated as the system evolves. And the faster a system evolves, the faster its documentation gets outdated. Ironically, this is the very type of system which needs the most up-to-date documentation!

An argument is often made, therefore, for self-documenting code. This is ostensibly the kind of code that doesn't need separate documentation because it is designed and implemented in such a way as to be self-explanatory to the reader.

How does anyone reading a codebase understand it? First, they need to know what the code is "supposed to do". Then they can graduate to figuring out how it does that. And this is where the problem of self-documenting code lies. Because reading code is essentially reading the how. "Query some things from a database table and process them into a map, match them against some other things from some other table, and return everything that does not match as a list". Well-written code makes it simple to understand how it is doing something. But it doesn't tell the reader why it is doing what it is doing. And hence the reader remains confused and documentation becomes necessary to understand the intent behind the system design.

So what kind of code would reveal, even in some limited way, why it does things the way it does them?

I want to talk about the only way (IMO) code can explain itself to readers and hence become self-documenting. Let's discuss the model underlying the code.

Becoming self-documenting

Before code comes the conceptual model that the code is the physical manifestation of. This is the mental model of the problem and the solution. This may be the domain model or another specific way of representing the programmer's thought process and how it will achieve a programmatic solution to the problem at hand.

The model is the core of low-level system design. It defines the things that we are working with, what their nature is, and what role they play in solving the problem at hand. The model must be developed first before any other aspects of the low-level design like APIs, data stores, data flows can be determined. These are physical forms of the things conceptualized in the model – ways to make the model “run”. The model itself is the why, and the code is the how.

The only way to make the code self-documenting is to make the code reveal the model underlying it.

This is why the only way to make the code self-documenting is to make the code reveal the model underlying it. Code that highlights the core model instead of hiding it in implementation details builds a narrative that is much more accessible than trying to infer the meaning of some lines in code. The lines can always be interpreted, but the model made evident in well-written code sets the context under which those lines of code make sense.

Let’s take an oversimplified version of Airbnb booking. In the object-oriented, REST-ful world, the code to update a booking might look something like this. (See this gist if you have disabled JS).

public class Booking {
	String uniqueId;
	User guest;
	User host;
	Date bookingTime;
	Date confirmationTime;
	Date cancellationTime;
	Status status; //PENDING, CONFIRMED, CANCELLED_BY_GUEST, CANCELLED_BY_HOST
	User lastUpdatedBy;
}
public class BookingUpdateRequest {
	Date updateTime;
	Booking updatedBooking;
}
// In Booking Service
public void updateBooking(BookingUpdateRequest request) {
	Booking originalBooking = readFromDB(request.updatedBooking.uniqueId);
	if ((originalBooking.status == PENDING || originalBooking.status == CONFIRMED) &&
		(request.updatedBooking.status == CANCELLED_BY_GUEST)) {
		originalBooking.lastUpdatedBy = originalBooking.guest;
		originalBooking.cancellationTime = request.updateTime
		// Trigger notification to host
		// Trigger refund if payment was taken
	} else if ((originalBooking.status == PENDING || originalBooking.status == CONFIRMED) &&
		(request.updatedBooking.status == CANCELLED_BY_HOST)) {
		originalBooking.lastUpdatedBy = originalBooking.host;
		// Trigger notification to guest
		// Trigger refund if payment was taken
	} 
	// else if ........ more conditions to handle other combinations of new/old variables
}

This type of generic update code is fairly common. To me, this doesn’t explain why things are happening the way they are happening. Did we miss any cases? While this code can be refactored to a much cleaner form, but it does not reveal why things are happening in this way.

Let’s consider an alternative (See this gist if you have disabled JS).

public class Booking {
	String uniqueId;
	User guest;
	User host;
	Date bookingTime;
	Date confirmationTime;
	Date cancellationTime;
	Status status; //PENDING, CONFIRMED, CANCELLED_BY_GUEST, CANCELLED_BY_HOST
	User lastUpdatedBy;
}

// In Booking Service
public void cancelBookingByGuest(String bookingId, Date cancellationTime) {
	Booking originalBooking = readFromDB(bookingId);
	originalBooking.lastUpdatedBy = originalBooking.guest;
	originalBooking.cancellationTime = cancelltaionTime;
	originalBooking.status = CANCELLED_BY_GUEST;
	// Trigger notification to host
	// Trigger refund if payment was taken
	updateInDB(originalBooking);
}
public void cancelBookingByHost(String bookingId, Date cancellationTime) {
	Booking originalBooking = readFromDB(bookingId);
	originalBooking.lastUpdatedBy = originalBooking.host;
	originalBooking.cancellationTime = cancellationTime;
	originalBooking.status = CANCELLED_BY_HOST;
	// Trigger notification to guest
	// Trigger refund if payment was taken
	updateInDB(originalBooking);
}
// Other APIs to handle combinations of new/old variables...

Or perhaps this (See this gist if you have disabled JS).

public class Booking {
	String uniqueId;
	User guest;
	User host;
	Date bookingTime;
	Date confirmationTime;
	Date cancellationTime;
	Status status; //PENDING, CONFIRMED, CANCELLED_BY_GUEST, CANCELLED_BY_HOST
	User lastUpdatedBy;
}
public class BookingUpdateRequest {
	AllowedActionOnBooking action;
	Date updateTime;
	Booking updatedBooking;
}
// Explicitly define the ways of modiying the Booking entity
public enum AllowedActionOnBooking {
	GUEST_CANCELLATION,
	HOST_CANCELLATION,
	CONFIRM,
	DATE_CHANGE
}
// In Booking Service
public void updateBooking(BookingUpdateRequest request) {
	Booking originalBooking = readFromDB(request.updatedBooking.uniqueId);
	switch (request.action) {
		case GUEST_CANCELLATION:
			handleGuestCancellationRequest(originalBooking, request);
			break;
		case HOST_CANCELLATION: 
			handleHostCancellationRequest(originalBooking, request);
			break;
		case CONFIRM:
			handleConfirmationRequest(originalBooking, request);
			break;
		case DATE_CHANGE:
			handleDateChangeRequest(originalBooking, request);
			break;
		default: throw new Exception("Unhandled action on booking");
	}
}

// Or move all these private methods to their own handler simplifying this class further
private void handleGuestCancellationRequest(Booking originalBooking, BookingUpdateRequest request) {
	originalBooking.lastUpdatedBy = originalBooking.guest;
	originalBooking.cancellationTime = cancelltaionTime;
	originalBooking.status = CANCELLED_BY_GUEST;
	// Trigger notification to host
	// Trigger refund if payment was taken
	updateInDB(originalBooking);
}
private void handleHostCancellationRequest(Booking originalBooking, BookingUpdateRequest request) {
	originalBooking.lastUpdatedBy = originalBooking.host;
	originalBooking.cancellationTime = cancelltaionTime;
	originalBooking.status = CANCELLED_BY_HOST;
	// Trigger notification to guest
	// Trigger refund if payment was taken
	updateInDB(originalBooking);
}
private void handleConfirmationRequest(Booking originalBooking, BookingUpdateRequest request) {
	// Business logic
}
private void handleDateChangeRequest(Booking originalBooking, BookingUpdateRequest request) {
	// Business logic
}

What is the difference?

Looking at these examples makes it obvious how we can write code that makes the conceptual model explicit and hence reduce the cognitive load on the reader. The way to do this is via abstractions. All code is likely to have some amount of abstractions, but not all abstractions surface the thought process behind the code. Often developers use the model abstractions only as data carriers without imbuing them with any behavioural or semantic significance. This is the case in the first example of the generic update API. The Booking abstraction merely carries the data, all meaning is encapsulated in the if-else conditions.

Model-Driven code, on the other hand, uses model abstractions to represent the core elements and uses them as the building blocks of all other interactions in the system. They are the heart of the system and all other code only manipulates them in ways defined and controlled by the model itself. This is the case in both the second and the third examples.

It is not the case that the code in the first example does not have a model underlying it. Much like it is not possible to have “no design” (there is always design, even if inadvertent and poor), it is not possible to have “no model”. The solution sitting in the developer’s head is the model. The first example just chooses to obscure it while the latter two make efforts to make it clear.

I hope this has made clear the benefits of explicitly using a conceptual model, and building the system around it. This won’t make the system (especially a large system) automatically self-evident, but it does go a long way in that direction.

From the internet this week

  1. Manuel Pais and Matthew Skelton reiterate their focus on “team cognitive load” in this talk on monoliths and microservices. Their book Team Topologies is a must-read - get it NOW if you haven’t already.
  2. Two book recommendations in one edition - that’s a first for this newsletter. Steps to an ecology of the Mind by Gregory Bateson sheds fascinating light on how computing was expected to evolve. For audio fans, here’s a sample in the author’s own voice.
  3. Javier Ramos has written a good comparison between Pulsar and Kafka. Let the streaming wars begin!

That's it for this week folks!

Cheers!

Kislay

Modify your subscription    |    View online