Photo Erik Škof, because the example deals with soap films.
Don’t repeat Yourself ?
The “Don’t repeat Yourself” (DRY) principle is a largely shared best practice in programming. It warns us against code duplication. The present point is to show how extreme duplication avoidance can produce more complexity. More generally, removing one code smell can produce more code smell.
How duplication happen ?
Let’s not give into the too easy solution “because people are lazy and stupid”. A more reasonable scenario is:
- Found a code matching my problem, but a variation is needed
- Copy/pasted
- Adding a variation for my problem
This goes totally along the Keep It Simple, Stupid principle. By duplicating, the developer avoid any changes to the origin code.
Why duplication is bad?
Duplication is one of the six major anti-patterns STUPID (Singleton, Tight-coupling ,Untestable, Premature-optimization, Indescriptive-naming, Duplication).
Let’s state the obvious: why duplication is harmful? When the same code is repeated several times, it comes with two human costs:
-
As long as repetitions are synchronized, developers must repeat their editions on all occurrences. For N lines repeated D times, the overwork is around Nx(D-1) lines of code. This is a form of Tight-coupling design: more code than ideally intended are linked by changes and create inertia.
-
Once repetitions are no more synchronized, developers must also take care of the local variations in the code. Each reading brings the temptation to remove the variations -which can be present for a good reason-.
How avoiding duplication could be bad, then?
Let’s use an optimistic situation: a code is duplicated 5 times, with 5 justified variations. The refactoring merges the codes into one larger code. This codes features now 5 variations within the same code.
Let’s use an example, with a code solving the evolution of a soap film (in pseudo code):
Algorithm SimulateSoapFilm():
Initialize soap_film as a 2D grid of values representing film thickness
Initialize time_step
Initialize simulation_duration
for time = 0 to simulation_duration do
Initialize next_film as a copy of the current soap_film
for each cell in soap_film do
Calculate Laplacian of film thickness in the neighborhood of the cell
Calculate change in film thickness based on Laplacian and other factors
Update next_film[cell] = current_film[cell] + change
// Apply boundary conditions (e.g., fixed thickness at edges)
if cell is at the boundary then
next_film[cell] = boundary_thickness
end if
end for
Update soap_film to be next_film
end for
End Algorithm
The 4 variations from the reference are:
- A 3-D version, to solve a bubble without frontiers
- A 3-D version for large bubble, stabilized by glycerine, with large deformations
- A 2-D version with moving frontiers
- A 2-D version with a source term.
Inside the merged code, depending how the if
statements are nested, the number of possible path to read the code ranges from 5 to 2**5 (32)! This is a large increase in cyclomatic complexity. Depending on how variations are triggered, the numbers of arguments of the function can rise from 0 to +4. The input complexity of the code also increased. For the soap film example, we get the following code:
Algorithm SimulateSoapFilm(is-3d, large-deformations, source-terms, moving-boundaries):
if is-3d:
Initialize soap_film as a 3D grid of values representing film thickness
else :
Initialize soap_film as a 2D grid of values representing film thickness
endif
Initialize time_step
Initialize simulation_duration
for time = 0 to simulation_duration do
Initialize next_film as a copy of the current soap_film
for each cell in soap_film do
if is-3d:
if large-deformations:
Calculate 3DLaplacian in large deformations
else:
Calculate 3DLaplacian
else:
Calculate 2DLaplacian
endif
if source-terms:
Apply source terms
Calculate change in film thickness based on Laplacian and other factors
Update next_film[cell] = current_film[cell] + change
if not is-3d:
// Apply boundary conditions (e.g., fixed thickness at edges)
if moving-boundaries:
if cell is at the boundary then
next_film[cell] = moving_boundary_thickness
end if
else:
if cell is at the boundary then
next_film[cell] = boundary_thickness
end if
end if
end if
end for
Update soap_film to be next_film
end for
End Algorithm
Note that the removal of duplication comes with a concerning change of the perimeter. Indeed, Before merging, the perimeter is:
- 2D film - simple
- 2D film with moving boundaries
- 2D film with source terms
- 3D bubble - simple
- 3D bubble with large deformations
After merging, new combinations are implicitly possible, for example:
- 3D bubble with large deformations with source terms
- 2D film with source terms and large deformations
- 3D bubble with moving boundaries - which probably makes no sense.
Finally, outside the merged code, the calls to the code became longer (extra arguments). The merged function might also be in a more generic, less application-specific context. And developers , A.I. or humans, rely a lot on context to know what to do.
One could hide this new complexity behind smarter structures like function overloading or templates. Something (compiler? processor?) will guess, just by knowing the hardware, the memory available, the type of data, which variation of the code will be used. We had cyclomatic or calling complexities, exchanged for structural complexity. This is a code bloater on its own called AHA “avoid hasty abstractions”, and influenced by Sandy Metz’s “prefer duplication over the wrong abstraction”.
We did remove duplication in this case, but ended up with a code more complex, with more situations to support.
Takeaway : Balancing simplicity and duplication
The present text tried to illustrate how the removal of duplications can produce a bad code too. The damage comes from the number and nature of variations between duplicates. Of course, in the ideal case of no variations, a fusion would add zero complexity.
In the end, the best codebase should be permanently discussed by the developers community to find the best compromise. We use programming principles to find this local optimum. However, we saw here the collision of two of these principles, “Keep it simple, stupid” and “Don’t repeat yourself”.
Next time you come across one of these principles, do not stop on their reassuring wise-man vibe: do question how far it is reasonable to stick to it, for your situation.
Let’s finish with a zen of python quote:
Special cases aren’t special enough to break the rules. Although practicality beats purity.