Reading time: 6 min

Navigating through a software: What if we had the map ?

Mapping softwares is good for you

Ever tried walking in a city you’re completely new to, having no idea where you are or how it’s organized ? If you take it linearly, one street after the other, it can be quite long and exhausting to get to know the city, and what do you always end up doing ? You mentally create a map of it.

Would it have been easier, maybe, and would it have taken less time and effort if they had given you a map in the first place ? Then you could have memorized the general scheme of it, how different parts of the town are linked with each other, and you could have focused on your parts of interest while aware of all this.

city_map

The idea in software geography is the same: being a new developer to a software, you could either spend months reading it linearly before figuring out how certain blocks are linked together, and finally start building a mental map of it over years… or they could give you the map. This is exactly why a global callgraph of a software can be useful: it’s for you to see the “geography” of the software. Just as a map of a city or region provides a visual representation of the location and relationships of different places and landmarks, a callgraph provides a visual representation of the relationships and interactions between different parts of the software. This visualization can help understanding the overall structure and organization of the codebase and navigate through it more easily. Just like a map, it shows the relation between different parts and how they interact with each other.

village_vs_ville

The bigger the city you walk in, or the bigger the code you have to read, the more interesting it gets to have a map of it at the beginning.

Added to the fact that today’s new developer will spend much more time reading the code than the new developer from 10 years ago, it’s harder and harder to ask someone to read the code linearly. Code size and complexity are evolving drastically over time. If it was already complex 10 years ago, it has probably become unreadable by now.

Having access to the software’s geography will help seeing the code from above, and then allow developers to focus on certain parts while being conscient of where they’re working and what other parts they’ll be affecting. It will help spending less time understanding the code architecture, avoid reading it linearly, while giving a vision on useful information for code quality in working teams: not only architecture and dependance, but also size, complexity…

Pros and cons of callgraphs as software maps

The theory of using callgraphs as software maps sounds very interesting. Not only it makes it easier to understand the interactions between different parts of the code, and navigate through it, it can show performance bottlenecks and areas of the code that may be prone to errors or bugs, it allows identifying parts of the code that are overly complex, making maintenance easier, but it also provides a reference, which can be used for discussion and thus improve communication and team work.

However, as often when a theoretical concept comes to reality, using callgraphs as software maps in real life does not exactly flow. The drawbacks of it are the following:

  1. Creating a global callgraph can be time-consuming and resource-intensive, especially for large and complex codebases.
  2. Global callgraphs can be hard to understand and interpret, especially for developers who are not familiar with the codebase or the software development process.
  3. Global callgraphs can become outdated quickly as the codebase evolves, requiring developers to spend time updating and maintaining the graph.
  4. Global callgraphs may be too detailed or too big to be of any help, making it hard to find what you are looking for.

The Marauder’s map tool developed at CERFACS aims at using callgraphs as software maps, while addressing the 4 drawbacks listed before.

Practical applications

In the context of the Center of Excellence Excellerat Phase II, Cerfacs is developing this geographical approach through the tool Marauder’s map to tackle these drawbacks.

Imagine a world map, naming the countries, the oceans, the poles. You use it to get an overview of what the world is like, how the big parts hold together. But what kind of maps do you actually use, for instance while driving or hiking ? They have smaller scales, they give different information: they don’t name the countries and oceans anymore, but the streets instead, and allow you to visualize the buildings to find your way. If we reverse the process, it would be really hard to look at a world map with all the street’s names.

Once again, the idea is the same with software geography. Global callgraphs are rarely used as such, although they give a nice overview of the codebase. However, it gets more interesting when we are able to navigate through smaller parts of the graph, and to choose the information that we need to see.

We will now introduce two different visualizations: the tree graph, and the global callgraph.

The Tree graph - clues from the authors

The tree graph shows the code’s wanted architecture. Lines of codes are gathered together by function, each function represented as a circle of size proportional to the number of lines. Functions are gathered by files, files gathered by folders and so on until the main repository. This construction is a tree graph with the main repository at the root. To highlight the nestings chosen in this tree, we prefer the circular packing.

With this tree, the reader sees how the developers divided the code, and how they named each division. In other words, the tree graph shows at a glance the mental map promoted by the developers.

tree_mmap

This is the tree graph of the package maraudersmap. The smallest circles are python functions or methods. In this rendering, these circles are colored by cyclomatic complexity.

This mental map can be misleading. For example all the code of file1 can make multiple calls to code stored in file2, with file1 and file2 at the same level and no clues on the names. This relation is invisible if authors overlooked this.

Luckily we have callgraphs to show the relations, even if authors hide them.

The global Callgraph - actual relations in the code

A callgraph represents the calling relationships beteween subroutines. It focuses on the blocks of code and their dependances.

Oppositely to the treegraph, a callgraph ignores the names and the storage strategies. Actually, it would yield the very same output even if all the routines were stored in a unique file, and were named #0001, #0002, #0003.

We keep routines as circles proportional to their size. Instead of being circular-packed, circles are linked where a routine calls another one. In graph theory, this is a directed graph. Finally, the disposition of nodes can be computed with various approximations of the n-body problem like the Barnes-Hut simulation.

Beware, our legacy HPC softwares are complex for real. The network we get from a raw callgraph is often way too entangled to figure out something useful. For example, a small “clean exit” function print_error_and_quit can be called from dozens on places. Such a hyperconnected node makes literal “knots”. Other functions are not used anymore, and some parts are plainly not interesting.

Anyway, once the low-interest parts are filtered out, and the main groups are correctly colored, the output becomes a pretty decent map.

callgraph_mmap

Global callgraph of the HPC large eddy simulation software AVBP : different blocks and their dependances - a link illustrates a function calling another.

Of course, the big picture is nice for beginners, but there is more to it. It’s possible to change of scale and search for more specific and expert information. The impressive ability of the human eye to detect curious patterns comes in handy. Out of the geometrical shapes in a callgraph, one can get interesting information.

geometric_shape

Zoom in the NUMERICS neighborhood of AVBP. Two clusters of routines are similar, and call the same final functions

In this image, “The red crab’s constellation” depicts the duplication of a group of routines. The numerical scheme on fixed grids (on the right) was duplicated into another cluster of routines for mobile grids (on the left). Both, albeit not identical, call the same six final functions, aligning them into a nice symmetry axis.

Some historical features, like a former riverbed, just show on real maps. Likewise, some of the history of a code will leak on the callgraph.

Take away

  • Studying software’s geography can be extremely useful: time and energy saving reading the code, good for developing in teams tracking down complexity, code size, dependances.
  • CERFACS is developing within Excellerat Phase 2 “The Marauder’s map”, an Open Source automated solution to overcome the challenges of creating usable maps.

Aknowledgements

This work is part of center of excellence EXCELLERAT Phase 2, funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) and Germany, Italy, Slovenia, Spain, Sweden, and France under grant agreement No 101092621.

RIGHT LEFT

Disclaimer

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European High Performance Computing Joint Undertaking ( JU) and Germany, Italy, Slovenia, Spain, Sweden, and France. Neither the European Union nor the granting authority can be held responsible for them.

Like this post? Share on: TwitterFacebookEmail


Léa Desplats is a engineer focused on COOP tools in Python.
Thibault Marzlin is an engineer working on COOP Python tools.
Antoine Dauptain is a research scientist focused on computer science and engineering topics for HPC.

Published

Category

Pitch

Tags

Stay in Touch