Reading time: 6 min

path

Photo Thomas Kinto on Unsplash

Marauder’s Map , or mmap in the command line, is a tool developed by Cerfacs in the Frame of the CoE Excellerat. It is a python helper tool to create visual representations of the internal structure of python and Fortran packages. Just like Harry Potter’s Marauder’s map, which can be asked different things and will adapt its response to the request, the Marauder’s map is an automated solution developed to study software geography.

You can install mmap from PyPI. The sources are available on Gitlab.com.

Lets break down how mmap can be used to explore a software. The present post will use ARPS, a tool dealing with storms, as an analysis material.

Preparing the analysis

First we get the sources. Best is to clone the repository in a specific location.

 ~/GITLAB/EXTERNAL_REPOS >git clone https://github.com/reinaldohaas/ARPS.git
Cloning into 'ARPS'...
remote: Enumerating objects: 1488, done.
remote: Counting objects: 100% (318/318), done.
remote: Compressing objects: 100% (276/276), done.
remote: Total 1488 (delta 43), reused 290 (delta 42), pack-reused 1170 (from 1)
Receiving objects: 100% (1488/1488), 101.50 MiB | 20.29 MiB/s, done.
Resolving deltas: 100% (286/286), done.
 ~/GITLAB/EXTERNAL_REPOS >

Once the codebase is present, take a look around. Start with LICENCE and README, to at least check you got the right repository. Then search for the sources:

~/GITLAB/EXTERNAL_REPOS/ARPS >ls -l
total 1560
-rw-r--r--   1 dauptain  staff    2314 Feb 25 09:15 BUGS
-rw-r--r--   1 dauptain  staff  257350 Feb 25 09:15 HISTORY
-rw-r--r--   1 dauptain  staff   58560 Feb 25 09:15 MANIFESTS
-rw-r--r--   1 dauptain  staff  257962 Feb 25 09:15 Makefile
-rw-r--r--   1 dauptain  staff   53078 Feb 25 09:15 Makefile.wrkdir
-rw-r--r--   1 dauptain  staff   21209 Feb 25 09:15 README
-rw-r--r--   1 dauptain  staff   13674 Feb 25 09:15 RELEASE.NOTES
-rw-r--r--   1 dauptain  staff    3618 Feb 25 09:15 TODO
-rw-r--r--   1 dauptain  staff    3963 Feb 25 09:15 UPDATE
lrwxr-xr-x   1 dauptain  staff      22 Feb 25 09:15 data.test -> test-arps5.3.4-feb2025
drwxr-xr-x  35 dauptain  staff    1120 Feb 25 09:15 docs
drwxr-xr-x  72 dauptain  staff    2304 Feb 25 09:15 include
drwxr-xr-x  81 dauptain  staff    2592 Feb 25 09:15 input
-rwxr-xr-x   1 dauptain  staff  108273 Feb 25 09:15 makearps
drwxr-xr-x  55 dauptain  staff    1760 Feb 25 09:15 scripts
drwxr-xr-x   9 dauptain  staff     288 Feb 25 09:15 sounding
drwxr-xr-x  42 dauptain  staff    1344 Feb 25 09:15 src
-rw-r--r--   1 dauptain  staff     626 Feb 25 09:15 tarefa.sh

From the look of it, the path ~/GITLAB/EXTERNAL_REPOS/ARPS/src seems the way to go. Now we create another folder to make the analysis, outside this repository. This will avoid the pollution of this git repository by our analysis files. There, use mmap anew to get the latest version of the analysis control file

 ~/GITLAB/EXTERNAL_REPOS/ARPS >cd ~/TEST/MMAP
 ~/TEST/MMAP >mkdir mmap_arps
 ~/TEST/MMAP >cd mmap_arps/
 ~/TEST/MMAP/mmap_arps >mmap anew
2025-02-25 09:27:02.967 | INFO     | maraudersmap.cli:anew:82 - Generating template inputfile ./mmap_in.yml for maraudersmap.
2025-02-25 09:27:02.969 | SUCCESS  | maraudersmap.cli:anew:87 - File ./mmap_in.yml created. Edit this file to set up your project...
 ~/TEST/MMAP/mmap_arps >

Now, edit the analysis mmap_in.yml and add just the path to the sources path and provide a name for the package , here my_arps:

# Input file for marauders map

# ============= SCOPE OF ANALYSIS ================
# rules about concerned sources ---------
path : /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src  #the common path to sources
context : null #by default, same as path to sources
mandatory_patterns: null  # by default all path taken. If provided e.g. "*src*","*sources*", all paths must match one of these patterns
forbidden_patterns: null  # by default all path taken. If provided e.g. "*autogen*","*test*", any path matching one of these patterns is excluded
package: my_arps # name of the package
cpp_directives: null # Add here the list of CPP directives to be taken into account when generating the graphs 
  #CWIPI:  True 

# Coloring by patterns ------------
color_rules :                       # coloring rules to apply (last color order prevail)
  root: "#EEDD88" #lightyellow -----
  default: cyan

# Cross grep patterns ------------
grep_patterns:   #Use list of Wildcard Matching patterns to color nodes based on their content  
  yellow: null
  #- "*!$ACC PARALLEL LOOP*"
  blue: null
  #- "*DO *nlen*"
  #- "*DO *nlcell*"
  #- "*DO *nplen*"

# Pruning graphs ------------
clean_graph:                            # Filtering parameters
  remove_patterns :                     # Wildcard Matching nodes be removed as a list
    - "*slave_*_memory*"
  subgraph_roots: null                  # Limit graph to the descendants of nodes in this list
  #    - "*slave_pre_temporal*"
  remove_hyperconnect: 10                # Remove nodes with more than X predecessors
  prune_lower_levels: 0                 # Remove lowest handing nodes by level (1 is the leaf level)

We are ready for the analysis

A view over the sources

We will start with a simple parsing of the procedures arborescence. This will check if the lightweight parser, of mmap: tucan , can make sense of the codebase. Indeed, some archaic ways of writing code can be hard to interpret, especially with old fortran.

(venv_default) dauptain@eldarion-macbookair-coop ~/TEST/MMAP/mmap_arps >mmap tree-gen
Recursive path gathering ...
Running struct ...
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/micro3d.f90
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/sorad3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/energy3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/mci2arps/sat2arps.c
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arpsagr/updbc.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/external/g2lib/getg2ir.f
Bad end type for END statement  135:end do  (expecting subroutine)
Struct analysis failed on external/g2lib/getg2ir.f
Would you like to continue (y/n)?n

The analysis quickly stopped on external/g2lib/getg2ir.f. Lets skip this external/* part of the code base. Go back to the mmap_in.yml and add the pattern to forbidden_patterns:

path : /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src  #the common path to sources
context : null #by default, same as path to sources
mandatory_patterns: null  # by default all path taken. If provided e.g. "*src*","*sources*", all paths must match one of these patterns
forbidden_patterns: 
  - external/*
package: my_arps # name of the package

This goes a bit further, but stops on a zxplot module. This codebase is a real challenge for tucan. Let’s just focus on src/arps:

path : /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps  #the common path to sources
context : null #by default, same as path to sources
mandatory_patterns: null  # by default all path taken. If provided e.g. "*src*","*sources*", all paths must match one of these patterns
forbidden_patterns: 
  - external/*
package: my_arps # name of the package

There , tree-gen only problem is on arps/sviio3d.f. Let’s accept the error and keep this for later. Once the tree is done, you get a tucan analysis of the codesource covered:

   |
   | Name:arps
   | Path:.
   | 24098        sum of Nb. Statements  
   | 44837        sum of Nb. lines of code  
   | 241.35 dys Halstead time         (agregated sum)
   | 372        Structural complexity (agregated sum)
   | 3332       Nb. of Loops          (agregated sum)
   | 6.46 hrs   for 50 lines of code
   | 177.44     Ctls. Pts. (McCabe)   (averaged per procedure)
   | 174.51     Halstead Difficulty   (averaged per procedure)
   | -84.28     Maintainability Index (averaged per procedure)
   | 3.34       Average indents       (averaged per procedure)
Repo Data dumped to /Users/dauptain/TEST/MMAP/mmap_arps/my_arps/struct_repo.json
File Data dumped to /Users/dauptain/TEST/MMAP/mmap_arps/my_arps/struct_files.json

Therefore we are dealing with 44k lines of fortran code. An average of 177 control points (i.e. the number of if, while etc.) per procedure, which is very high (50 is considered high). Halstead time of approximatively one year means one person, knowing exactly the algorithms, would take one full year to write it again. Scaled to 50 lines of codes means its is also pretty high. This explains why the “Maintainability Index” is negative.

These figures indicates this code was not written for a team of regularly trained expert persons, not for the passerby.

We can dive into the tree with tree-show. Best to start with the “Cyclomatic complexity”, to see immediately where the business logic - the control points- is located.

 ~/TEST/MMAP/mmap_arps >mmap tree-show CCN

This opens a “nob-visual” windows presenting the circular packing of sources.

treeshow

We see a very hot module , my3mom_main_mod.f90, with a staggering 856 CCN (Cyclomatic Complexity), followed by the arpsmpbudget.f90 at 245 CCN. Good news, a lot of the codebase is stored in small files of low CCN complexity.

Let’s see where are the biggest abstractions now:

 ~/TEST/MMAP/mmap_arps >mmap tree-show CST

treeshow

We do have several abstrations here, the most complex being netio-metada, followed by the large dtaread , arbitrary_vario , and even larger mp_wsm6 and kfeta

With this first glance, we can start the definition of our “Color Pattern”.

Defining the color pattern

The “Color Pattern” is used to keep a constant color reference across the outputs of mmap. Elaborate the pattern using the mmap_in.yml and the command mmap tree-show PTN. You can refer to matplotlib named colors for reference.

# Coloring by patterns ------------
color_rules :                       # coloring rules to apply (last color order prevail)
  wsm6: lightgreen
  main_mod: khaki
  kfeta: turquoise
  cu_bmj: plum
  mpbudget: pink
  dtaread: yellow
  gradsio3d: red
  v5dio3d: orange
  bl_ysu: green
  arps.f90: peru
  nem3d: coral
  arbitrary_vario: chartreuse
  binio3d: teal
  ascio3d: cadetblue
  init3d: cyan
  soildiag3d: brown
  gribio: forestgreen
  netio_metadata: lime  

Which gives :

~/TEST/MMAP/mmap_arps >mmap tree-show PTN

treeshow

Moving to static callgraph

Now we try the static callgraph:

~/TEST/MMAP/mmap_arps >mmap cg-gen
ilter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/smooth3d.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/iolib3d.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/irixlib3d.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_cu_kfeta.f90 is OK
filter /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/v5d.c is OK
(...)
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/initlib3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/arps.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/ascio3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/init3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/kfinterfc.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/raddata3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/advct3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/thermolib3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/arpsread.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_cu_kfeta.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_arps_netio_metadata.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/exbcio3d.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/module_arps_dtaread.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/kfpara.f90
Struct analysis on /Users/dauptain/GITLAB/EXTERNAL_REPOS/ARPS/src/arps/irixlib3d.f90
Fortran Keywords char used as a variable in the code.
Fortran Keywords char used as a variable in the code.
Fortran Keywords char used as a variable in the code.
Fortran Keywords char used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords ibits used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords ibits used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Fortran Keywords index used as a variable in the code.
Computing callgraph, this can take a while...
 Found contains 95 / 95 99%
 Found parents  11 / 11 99%
 Found callables3543 / 15472 22%
Callgraph generated
Generating my_arps/callgraph.json.

We plot the callgraph using the following command.

~/TEST/MMAP/mmap_arps >mmap cg-show -b pyvis -c ptn -l
Start filtering graph
1191 nodes / 3643 edges
Remove hyperconnected roots starting at  10
Removing hyperconnected nodes, starting from 10 predecessors
    53 nodes removed by hyperconnection
1138 nodes / 2232 edges
Remove patterns: ['*slave_*_memory*']
Removing by patterns.
  *slave_*_memory*: 0 matchs and 0 descendants
    0 nodes removed by patterns
1138 nodes / 2232 edges
Remove singles
Removing single nodes or self connected nodes
    368 nodes removed by singularity or self connection
770 nodes / 2006 edges
After cleaning :770 nodes/2006 edges
Sizes 2.73861278752583/98.22168803273541
Masses 1/19.295
mmap_calls.html
Output written to mmap_calls.html

Let’s break this into bits: - -b pyvis : the pyvis backend allows a dynamic exploration of the graph. Using a good barnes-hut layout algorithm, is the best way to map the calls by adjacency. - -c ptn : color the nodes using the customized pattern - -l : load the result into your prefered browser right away.

A lot of nodes and edges have been removed by “hyperconnection” (more than 10 predecessors). By default this avoids the plotting of atomic functions used everywhere (like print_error_and_quit()). Change the parameter remove_hyperconnect to control this. At the end we have 770 nodes and 2006 edges. As long as you are below 4000 edges, the result will be usable through pyvis, but slows down with the size of your graph. If the graph is bigger, you should focus on a smaller portion of the code…

cgshow

We see the global graph, with the brown star of arps in the middle (a star denotes a node without predecessors).

cgshow

By zooming in, a very important hub is visible under arps : cordintg. We decide to focus the callgraph analysis to only this one:

clean_graph:                            # Filtering parameters
  subgraph_roots: # null                  # Limit graph to the descendants of nodes in this list
      - "*cordintg*"

Then we retry the same command :

(venv_default) dauptain@eldarion-macbookair-coop ~/TEST/MMAP/mmap_arps >mmap cg-show -b pyvis -c ptn -l
Start filtering graph
1191 nodes / 3643 edges
Limiting to subgraph roots ['*cordintg*']
Getting subgraph from node tinteg3d.f90:cordintg
488 nodes / 1672 edges
Remove hyperconnected roots starting at  10
Removing hyperconnected nodes, starting from 10 predecessors
    23 nodes removed by hyperconnection
465 nodes / 1226 edges
Remove patterns: ['*slave_*_memory*']
Removing by patterns.
  *slave_*_memory*: 0 matchs and 0 descendants
    0 nodes removed by patterns
465 nodes / 1226 edges
Remove singles
Removing single nodes or self connected nodes
    5 nodes removed by singularity or self connection
460 nodes / 1221 edges
After cleaning :460 nodes/1221 edges
Sizes 3.1622776601683795/94.44575162494075
Masses 1/17.84

And tadaa, we got a callgraph, twice smaller, focused on this apparently important procedure:

cgshow

Takeaway

Through this post, you have seen how one can use mmap to create different graphical representations of a codebase. More features are available, such as the “cross-grep” (Highlight parts matching two multiple criteria such as “loops on elements” and “OpenACC directives”) or the customized execution graph (Customized view of a Valgrind trace). We will continuously update the tool to provide more interesting features. Stay in touch…

Like this post? Share on: TwitterFacebookEmail


Thibault Marzlin is an engineer working on COOP Python tools.
Antoine Dauptain is a research scientist on computer science and engineering for HPC. He is the assistant team leader of COOP.

Keep Reading


Published

Category

Tutorials

Tags

Stay in Touch