Reading time: 6 min
Photo Thomas Kinto on Unsplash
Marauder’s Map , or mmap
in the command line, is a tool developed by Cerfacs in the Frame of the CoE Excellerat. It is a python helper tool to create visual representations of the internal structure of python and Fortran packages.
Just like Harry Potter’s Marauder’s map, which can be asked different things and will adapt its response to the request, the Marauder’s map is an automated solution developed to study software geography.
You can install mmap
from PyPI. The sources are available on Gitlab.com.
Lets break down how mmap
can be used to explore a software. The present post will use ARPS, a tool dealing with storms, as an analysis material.
Preparing the analysis
First we get the sources. Best is to clone the repository in a specific location.
~/GITLAB/EXTERNAL_REPOS >git clone https://github.com/reinaldohaas/ARPS.git
Cloning into 'ARPS'...
remote: Enumerating objects: 1488, done.
remote: Counting objects: 100% (318/318), done.
remote: Compressing objects: 100% (276/276), done.
remote: Total 1488 (delta 43), reused 290 (delta 42), pack-reused 1170 (from 1)
Receiving objects: 100% (1488/1488), 101.50 MiB | 20.29 MiB/s, done.
Resolving deltas: 100% (286/286), done.
~/GITLAB/EXTERNAL_REPOS >
Once the codebase is present, take a look around. Start with LICENCE and README, to at least check you got the right repository. Then search for the sources:
~/GITLAB/EXTERNAL_REPOS/ARPS >ls -l
total 1560
-rw-r--r-- 1 dauptain staff 2314 Feb 25 09:15 BUGS
-rw-r--r-- 1 dauptain staff 257350 Feb 25 09:15 HISTORY
-rw-r--r-- 1 dauptain staff 58560 Feb 25 09:15 MANIFESTS
-rw-r--r-- 1 dauptain staff 257962 Feb 25 09:15 Makefile
-rw-r--r-- 1 dauptain staff 53078 Feb 25 09:15 Makefile.wrkdir
-rw-r--r-- 1 dauptain staff 21209 Feb 25 09:15 README
-rw-r--r-- 1 dauptain staff 13674 Feb 25 09:15 RELEASE.NOTES
-rw-r--r-- 1 dauptain staff 3618 Feb 25 09:15 TODO
-rw-r--r-- 1 dauptain staff 3963 Feb 25 09:15 UPDATE
lrwxr-xr-x 1 dauptain staff 22 Feb 25 09:15 data.test -> test-arps5.3.4-feb2025
drwxr-xr-x 35 dauptain staff 1120 Feb 25 09:15 docs
drwxr-xr-x 72 dauptain staff 2304 Feb 25 09:15 include
drwxr-xr-x 81 dauptain staff 2592 Feb 25 09:15 input
-rwxr-xr-x 1 dauptain staff 108273 Feb 25 09:15 makearps
drwxr-xr-x 55 dauptain staff 1760 Feb 25 09:15 scripts
drwxr-xr-x 9 dauptain staff 288 Feb 25 09:15 sounding
drwxr-xr-x 42 dauptain staff 1344 Feb 25 09:15 src
-rw-r--r-- 1 dauptain staff 626 Feb 25 09:15 tarefa.sh
From the look of it, the path ~/GITLAB/EXTERNAL_REPOS/ARPS/src
seems the way to go.
Now we create another folder to make the analysis, outside this repository. This will avoid the pollution of this git repository by our analysis files. There, use mmap anew
to get the latest version of the analysis control file
~/GITLAB/EXTERNAL_REPOS/ARPS >cd ~/TEST/MMAP
~/TEST/MMAP >mkdir mmap_arps
~/TEST/MMAP >cd mmap_arps/
~/TEST/MMAP/mmap_arps >mmap anew
2025-02-25 09:27:02.967 | INFO | maraudersmap.cli:anew:82 - Generating template inputfile ./mmap_in.yml for maraudersmap.
2025-02-25 09:27:02.969 | SUCCESS | maraudersmap.cli:anew:87 - File ./mmap_in.yml created. Edit this file to set up your project...
~/TEST/MMAP/mmap_arps >
Now, edit the analysis mmap_in.yml
and add just the path to the sources path
and provide a name for the package , here my_arps
:
# Input file for marauders map
# ============= SCOPE OF ANALYSIS ================
# rules about concerned sources ---------
path : /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src #the common path to sources
context : null #by default, same as path to sources
mandatory_patterns: null # by default all path taken. If provided e.g. "*src*","*sources*", all paths must match one of these patterns
forbidden_patterns: null # by default all path taken. If provided e.g. "*autogen*","*test*", any path matching one of these patterns is excluded
package: my_arps # name of the package
cpp_directives: null # Add here the list of CPP directives to be taken into account when generating the graphs
#CWIPI: True
# Coloring by patterns ------------
color_rules : # coloring rules to apply (last color order prevail)
root: "#EEDD88" #lightyellow -----
default: cyan
# Cross grep patterns ------------
grep_patterns: #Use list of Wildcard Matching patterns to color nodes based on their content
yellow: null
#- "*!$ACC PARALLEL LOOP*"
blue: null
#- "*DO *nlen*"
#- "*DO *nlcell*"
#- "*DO *nplen*"
# Pruning graphs ------------
clean_graph: # Filtering parameters
remove_patterns : # Wildcard Matching nodes be removed as a list
- "*slave_*_memory*"
subgraph_roots: null # Limit graph to the descendants of nodes in this list
# - "*slave_pre_temporal*"
remove_hyperconnect: 10 # Remove nodes with more than X predecessors
prune_lower_levels: 0 # Remove lowest handing nodes by level (1 is the leaf level)
We are ready for the analysis
A view over the sources
We will start with a simple parsing of the procedures arborescence. This will check if the lightweight parser, of mmap
: tucan
, can make sense of the codebase. Indeed, some archaic ways of writing code can be hard to interpret, especially with old fortran.
(venv_default) dauptain@eldarion-macbookair-coop ~/TEST/MMAP/mmap_arps >mmap tree-gen
Recursive path gathering ...
Running struct ...
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/micro3d.f90
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/sorad3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/energy3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/mci2arps/sat2arps.c
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arpsagr/updbc.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/external/g2lib/getg2ir.f
Bad end type for END statement 135:end do (expecting subroutine)
Struct analysis failed on external/g2lib/getg2ir.f
Would you like to continue (y/n)?n
The analysis quickly stopped on external/g2lib/getg2ir.f
.
Lets skip this external/*
part of the code base.
Go back to the mmap_in.yml
and add the pattern to forbidden_patterns
:
path : /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src #the common path to sources
context : null #by default, same as path to sources
mandatory_patterns: null # by default all path taken. If provided e.g. "*src*","*sources*", all paths must match one of these patterns
forbidden_patterns:
- external/*
package: my_arps # name of the package
This goes a bit further, but stops on a zxplot
module. This codebase is a real challenge for tucan
. Let’s just focus on src/arps
:
path : /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps #the common path to sources
context : null #by default, same as path to sources
mandatory_patterns: null # by default all path taken. If provided e.g. "*src*","*sources*", all paths must match one of these patterns
forbidden_patterns:
- external/*
package: my_arps # name of the package
There , tree-gen
only problem is on arps/sviio3d.f
. Let’s accept the error and keep this for later.
Once the tree is done, you get a tucan
analysis of the codesource covered:
|
| Name:arps
| Path:.
| 24098 sum of Nb. Statements
| 44837 sum of Nb. lines of code
| 241.35 dys Halstead time (agregated sum)
| 372 Structural complexity (agregated sum)
| 3332 Nb. of Loops (agregated sum)
| 6.46 hrs for 50 lines of code
| 177.44 Ctls. Pts. (McCabe) (averaged per procedure)
| 174.51 Halstead Difficulty (averaged per procedure)
| -84.28 Maintainability Index (averaged per procedure)
| 3.34 Average indents (averaged per procedure)
Repo Data dumped to /Users/dauptain/TEST/MMAP/mmap_arps/my_arps/struct_repo.json
File Data dumped to /Users/dauptain/TEST/MMAP/mmap_arps/my_arps/struct_files.json
Therefore we are dealing with 44k lines of fortran code. An average of 177 control points (i.e. the number of if, while etc.) per procedure, which is very high (50 is considered high). Halstead time of approximatively one year means one person, knowing exactly the algorithms, would take one full year to write it again. Scaled to 50 lines of codes means its is also pretty high. This explains why the “Maintainability Index” is negative.
These figures indicates this code was not written for a team of regularly trained expert persons, not for the passerby.
We can dive into the tree with tree-show
. Best to start with the “Cyclomatic complexity”, to see immediately where the business logic - the control points- is located.
~/TEST/MMAP/mmap_arps >mmap tree-show CCN
This opens a “nob-visual” windows presenting the circular packing of sources.
We see a very hot module , my3mom_main_mod.f90
, with a staggering 856 CCN (Cyclomatic Complexity), followed by the arpsmpbudget.f90
at 245 CCN. Good news, a lot of the codebase is stored in small files of low CCN complexity.
Let’s see where are the biggest abstractions now:
~/TEST/MMAP/mmap_arps >mmap tree-show CST
We do have several abstrations here, the most complex being netio-metada
, followed by the large dtaread
, arbitrary_vario
, and even larger mp_wsm6
and kfeta
With this first glance, we can start the definition of our “Color Pattern”.
Defining the color pattern
The “Color Pattern” is used to keep a constant color reference across the outputs of mmap
. Elaborate the pattern using the mmap_in.yml
and the command mmap tree-show PTN
. You can refer to matplotlib named colors for reference.
# Coloring by patterns ------------
color_rules : # coloring rules to apply (last color order prevail)
wsm6: lightgreen
main_mod: khaki
kfeta: turquoise
cu_bmj: plum
mpbudget: pink
dtaread: yellow
gradsio3d: red
v5dio3d: orange
bl_ysu: green
arps.f90: peru
nem3d: coral
arbitrary_vario: chartreuse
binio3d: teal
ascio3d: cadetblue
init3d: cyan
soildiag3d: brown
gribio: forestgreen
netio_metadata: lime
Which gives :
~/TEST/MMAP/mmap_arps >mmap tree-show PTN
Moving to static callgraph
Now we try the static callgraph:
~/TEST/MMAP/mmap_arps >mmap cg-gen
ilter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/smooth3d.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/iolib3d.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/irixlib3d.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_cu_kfeta.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/v5d.c is OK
(...)
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/initlib3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/arps.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/ascio3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/init3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/kfinterfc.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/raddata3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/advct3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/thermolib3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/arpsread.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_cu_kfeta.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_arps_netio_metadata.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/exbcio3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_arps_dtaread.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/kfpara.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/irixlib3d.f90
Fortran Keywords char used as a variable in the code.
Fortran Keywords char used as a variable in the code.
Fortran Keywords char used as a variable in the code.
Fortran Keywords char used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords ibits used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords ibits used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Computing callgraph, this can take a while...
Found contains 95 / 95 99%
Found parents 11 / 11 99%
Found callables3543 / 15472 22%
Callgraph generated
Generating my_arps/callgraph.json.
We plot the callgraph using the following command.
~/TEST/MMAP/mmap_arps >mmap cg-show -b pyvis -c ptn -l
Start filtering graph
1191 nodes / 3643 edges
Remove hyperconnected roots starting at 10
Removing hyperconnected nodes, starting from 10 predecessors
53 nodes removed by hyperconnection
1138 nodes / 2232 edges
Remove patterns: ['*slave_*_memory*']
Removing by patterns.
*slave_*_memory*: 0 matchs and 0 descendants
0 nodes removed by patterns
1138 nodes / 2232 edges
Remove singles
Removing single nodes or self connected nodes
368 nodes removed by singularity or self connection
770 nodes / 2006 edges
After cleaning :770 nodes/2006 edges
Sizes 2.73861278752583/98.22168803273541
Masses 1/19.295
mmap_calls.html
Output written to mmap_calls.html
Let’s break this into bits: - -b pyvis
: the pyvis backend allows a dynamic exploration of the graph. Using a good barnes-hut layout algorithm, is the best way to map the calls by adjacency. - -c ptn
: color the nodes using the customized pattern - -l
: load the result into your prefered browser right away.
A lot of nodes and edges have been removed by “hyperconnection” (more than 10 predecessors). By default this avoids the plotting of atomic functions used everywhere (like print_error_and_quit()
). Change the parameter remove_hyperconnect
to control this.
At the end we have 770 nodes and 2006 edges. As long as you are below 4000 edges, the result will be usable through pyvis, but slows down with the size of your graph.
If the graph is bigger, you should focus on a smaller portion of the code…
We see the global graph, with the brown star of arps
in the middle (a star denotes a node without predecessors).
By zooming in, a very important hub is visible under arps
: cordintg
.
We decide to focus the callgraph analysis to only this one:
clean_graph: # Filtering parameters
subgraph_roots: # null # Limit graph to the descendants of nodes in this list
- "*cordintg*"
Then we retry the same command :
(venv_default) dauptain@eldarion-macbookair-coop ~/TEST/MMAP/mmap_arps >mmap cg-show -b pyvis -c ptn -l
Start filtering graph
1191 nodes / 3643 edges
Limiting to subgraph roots ['*cordintg*']
Getting subgraph from node tinteg3d.f90:cordintg
488 nodes / 1672 edges
Remove hyperconnected roots starting at 10
Removing hyperconnected nodes, starting from 10 predecessors
23 nodes removed by hyperconnection
465 nodes / 1226 edges
Remove patterns: ['*slave_*_memory*']
Removing by patterns.
*slave_*_memory*: 0 matchs and 0 descendants
0 nodes removed by patterns
465 nodes / 1226 edges
Remove singles
Removing single nodes or self connected nodes
5 nodes removed by singularity or self connection
460 nodes / 1221 edges
After cleaning :460 nodes/1221 edges
Sizes 3.1622776601683795/94.44575162494075
Masses 1/17.84
And tadaa, we got a callgraph, twice smaller, focused on this apparently important procedure:
Takeaway
Through this post, you have seen how one can use mmap
to create different graphical representations of a codebase.
More features are available, such as the “cross-grep” (Highlight parts matching two multiple criteria such as “loops on elements” and “OpenACC directives”) or the customized execution graph (Customized view of a Valgrind trace).
We will continuously update the tool to provide more interesting features. Stay in touch…