Subscribe to PHP Freaks RSS

Keep an eye on the churn; finding legacy code monsters

syndicated from planet-php.net on January 18, 2018

Setting the stage: Code complexity



Code complexity often gets measured by calculating the Cyclomatic Complexity per unit of code. The number can be calculated by taking all the branches of the code into consideration.



Code complexity is an indicator for several things:



  • How hard it is to understand a piece of code; a high number indicates many branches in the code. When reading the code, a programmer has to keep track of all those branches, in order to understand all the different ways in which the code can work.
  • How hard it is to test that piece of code; a high number indicates many branches in the code, and in order to fully test the piece of code, all those branches need to be covered separately.

In both cases, high code complexity is a really bad thing. So, in general, we always strive for low code complexity. Unfortunately, many projects that you'll inherit ("legacy projects"), will contain code that has high code complexity, and no tests. A common hypothesis is that a high code complexity arises from a lack of tests. At the same time, it's really hard to write tests for code with high complexity, so this is a situation that is really hard to get out.



If you're curious about cyclometic complexity for your PHP project, phploc gives you a nice summary, e.g.



Complexity
  Cyclomatic Complexity / LLOC                    0.30
  Cyclomatic Complexity / Number of Methods       3.03


Setting the stage: Churn



Code complexity doesn't always have to be a big problem. If a class has high code complexity, but you never have to touch it, there's no problem at all. Just leave it there. It works, you just don't touch it. Of course it's not ideal; you'd like to feel free to touch any code in your code base. But since you don't have to, there's no real maintainability issue. This bad class doesn't cost you much.



What's really dangerous for a project is when a class with a high code complexity often needs to be modified. Every change will be dangerous. There are no tests, so there's no safety net against regressions. Having no tests, also means that the behavior of the class hasn't been specified anywhere. This makes it hard to guess if a piece of weird code is "a bug or a feature". Some client of this code may be relying on the behavior. In short, it will be dangerous to modify things, and really hard to fix things. That means, touching this class is costly. It takes you hours to read the code, figure out the right places to make the change, verify that the application didn't break in unexpected ways, etc.



Michael Feathers introduced the word "churn" for change rate of files in a project. Churn gets its own number, just like code complexity. Given the assumption that each class is defined in its own file, and that every time a class changes, you create a new commit for it in your version control software, a convenient way to quantify "churn" for a class would be to simply count the number of commits that touch the file containing it.



Thinking about version control: there is much more valuable information that you could derive from your project's history. Take a look at the book "Your Code as a Crime Scene" by Adam Tornhill for more ideas and suggestions.



Code insight



Combining these two metrics, code complexity and churn, we could get some interesting insights about our code base. We could easily spot the classes/files which often need to be modified, but are also very hard to modify, because they have a high code complexity. Since the metrics themselves are very easy to calculate (there's tooling for both), it should be easy to do this.



Since tooling would have to be specific for the programming language and version control system used, I can't point you to something that always works, but for PHP and Git, there's churn-php. You can install it as a Composer dependency in your project, or run it in a Docker container. It will show you a table of classes with high churn and high complexity, e.g.



+-----------------------+---------------+------------+-------+
| File                  | Times Changed | Complexity | Score |
+-----------------------+---------------+------------+-------+
| src/Core/User.php     | 355           | 875        | 1     |
| src/Sales/Invoice.php | 235           | 274        | 0.234 |
+-----------------------+---------------+------------+-------+

Truncated by Planet PHP, read more at the original (another 3359 bytes)