Lines of code actually matters.

For a long time I thought lines of code was a bogus metric, but after working in both Java and Clojure, I have changed my mind.

This article originally appeared on IG’s blog

After more than 15 years of Java experience, I have tended to brush aside comments about Java’s verbosity with one of the following arguments:

lines of code (LOC) is a bogus metric;
IDEs generate 90% of my Java code;
lessons learned from PERL’s notorious and incomprehensible conciseness.

LOC metrics are simply not important.

Or are they?

Some time ago we started building our first Spark jobs. The first two that we wrote where basically the same:

Read a CSV file from HDFS
Transform each line to JSON
Push each JSON to Kafka

It so happened that we wrote one in Clojure and one in Java. When we reviewed the code, this is how the Java version looked:

Java classed picture

At first I was surprised that there were so many classes.

Then I was surprised that I was surprised about finding so many classes. After all, the code was the perfectly idiomatic Java code that we all have come to write.

But why was I surprised in the first place? Probably because the Clojure version looked like this:

Clojure file picture

But maybe it was one huge file with hundreds of lines of code? No. Just 58 lines of code.

Perhaps the Clojure version was a completely unreadable gibberish of magic variables and parentheses all over the place? Here is the main transformation logic between the two versions:

compare

The only difference in readability is that the Java version has a lot more parentheses.

The code review

I usually would not pay attention to Java’s verbosity, but during the Java code review I found myself thinking about:

Which class should I start with?
Which class should I go next?
How the classes fit together?
How the dependency graph looks like?
Who implements those interfaces? Are they necessary?
What are the responsibilities of each class?
Where is the data transformation?
Is the data transformation correct?

While the Clojure code review was about:

Is the data transformation correct?

This made me realize that the Clojure version was way simpler to understand, and that the fact of having a single file with 58 lines of code was a very important reason for it.

What about bigger projects?

I don’t have any bigger project where the requirements where exactly the same as in here, but it is true that our Clojure micro-services have no more than 10 files, usually 3 or 4, while the simplest of our Java micro-service has several dozens.

And from experience, we know that the time to understand a codebase with 4 small classes is not the same as understanding one with 50 classes.

Incidental Complexity

So given that the inherent complexity of the problem is the same, and that the Clojure version is able to express the solution in 58 lines of code while the Java version require 441 lines of code, and that the Clojure version is easier to understand, what are those extra 383 (87% of the codebase) lines of perfectly idiomatic Java code about?

The answer is that all those extra lines of code fall into the incidental complexity bucket - that complexity that we (programmers) create ourselves by not using the right tools, complexity that our business paid us to create, pays us to maintain, but never really ever asked for.

Are lines of code important? Not as a measure of productivity, but certainly as a measure of complexity, especially if this complexity is incidental instead of inherent.

Imagine deleting 87% of all the code that you have to maintain!