vines

Photo Debby Ledet on Unsplash. The growth of software codebases often comes with an increase of internal ramifications. Unattended, these ramifications slow down the onboarding of new contributors. Here we present a solution for a python codebase focused on Imports graphs.

Imports graphs allow to identify how modules are coupled and understand where the external dependencies are really required. Yet, we can’t find any out of the box tool that satisfies all our needs.

The example script

We use findimports and graphviz to create a flexible script that generates an imports graph. Our contribution lies mostly on the filtering of (unadequate) information (stdlib-list is of great help here).

Fetch the information

The main function is graphpack(). With findimports we can parse our codebase to get all imports:

from findimports import ModuleGraph

g = ModuleGraph()
g.external_dependencies = True
g.parsePathname(package_path)

Clean the graph

The raw import graph is often too complex to be understandable. We remove here :

  • the internal modules specified in ignore_modules, like a command line module.
  • the external packages specified in ignore_packages like the numpy library and all the standard library.
  • the __inits__() that will provide no information.

then we create a graph with the library graphviz.

The actual code

The actual code of the script is as wfollows

import sys

import fnmatch
from stdlib_list import stdlib_list
import graphviz
from findimports import ModuleGraph

def graphpack(
        path,
        package_name,
        ignore_packages=None,
        ignore_modules=None,
        color_rules= None
    ):
    """Plot a graph from the imports of your package

    Args:
        package_name: str, you package name
        ignore_packages list of string : names of packages to ignore
        ignore_modules list of string : names of modules to ignore
        color_rules: dict of colors names to customize the graph:
            color_rules = {"foo": "purple", "bar": "blue"}
            will paint all names containing foo in purple and bar in blue.
            if several matches, the last teakes precendence.

    Returns:
        writes down a {package_name}.svg image.
        """

    if ignore_modules is None:
        ignore_modules=[]
    if ignore_packages is None:
        ignore_modules=[]


    modgraph = ModuleGraph()
    modgraph.external_dependencies = True
    modgraph.parsePathname(path+"/"+package_name)
    #modgraph.printImports()

    _del_modules(modgraph,ignore_modules)
    _del_inits(modgraph)
    internal_modules = set(module.label for module in modgraph.listModules())
    _del_external_packages(modgraph,internal_modules,ignore_packages)
    _create_graph(modgraph, internal_modules,package_name, color_rules)



def _del_modules(modgraph,ignore_modules):
    """Remove the ingorme module list"""

    to_delete = []
    for module_name in ignore_modules:
        for act_module in modgraph.modules:
            if fnmatch.fnmatch(act_module, module_name):
                to_delete.append(act_module)

    for del_mod in to_delete:
        print(f"skipping module: {del_mod}")
        del modgraph.modules[del_mod]


def _del_inits(modgraph):
    """Remove the __inits__"""

    for module in modgraph.listModules():
        # Simplify labes of a module  if __init__ in inside
        if module.label.split('.')[-1] == '__init__':
            module.label = '.'.join(module.label.split('.')[:-1])

        # enleve les imports de ce module termines par INIT et les remplace par le nom au dessus
        for import_ in module.imports.copy():
            if import_.split('.')[-1] == '__init__':
                module.imports.remove(import_)
                module.imports.add('.'.join(import_.split('.')[:-1]))

def _del_external_packages(modgraph, internal_modules, ignore_packages):
    """Remove stdlib packages and user defined ignore-packages list"""

    stdlib = set(stdlib_list(sys.version[0:3]))
    for module in modgraph.listModules():
        for import_ in module.imports.copy():
            # do not accept standard library
            if import_ in stdlib:
                module.imports.remove(import_)
            # simplify external dependencies
            elif import_ not in internal_modules:
                module.imports.remove(import_)
                import_ = import_.split('.')[0]

                if import_ not in ignore_packages:
                    module.imports.add(import_)

def _create_graph(modgraph, internal_modules,filename, color_rules):
    """Create the graph with  graphiz"""
    node_names = set()
    for module in modgraph.listModules():
        node_names.add(module.label)
        node_names |= module.imports

    if color_rules is None:
        color_rules = {}


    dot = graphviz.Digraph(filename, engine="dot")
    # create nodes
    for name in node_names:
        style = "" if name in internal_modules else 'dashed'
        color="black"
        for key in  color_rules:
            if key in name:
                style="filled"
                color = color_rules[key]
        dot.node(name, style=style, color=color , shape='record')
    # add edges
    for module in modgraph.listModules():
        for import_ in module.imports:
            dot.edge(module.label, import_)

    dot.render(filename, format='svg', cleanup=True, view=True)

Application to an actual package

We will apply this function to a codebase called pyavbp. Note that the number of information filtered out is quite high :

graphpack(
    path="/Users/dauptain/GITLAB/",
    package_name="pyavbp",
    ignore_packages = [
        "nob",
        "ms_thermo",
        "opentea",
        "numpy",
        "h5py",
        "yaml",
        "pkg_resources",
        "scipy",
        "h5cross",
        "arnica",
        "f90nml",
        "directory_tree",
        "pyavbp"
    ],
    ignore_modules = [
        "pyavbp.cli",
    #    "pyavbp.tools.main_makeinject",
        "*test*",
    #    "*injector*",
        "*INPUT*",
     #   "pyavbp.io.mesh_utils",
    ],
    color_rules= {
        "avbtp": "purple",
        "tavbp": "blue",
        "avtp": "red",
        "combu": "orange",
        "tools": "green",
        "postproc": "green",
        "mesh2curve": "green",
    }
)

We add a little bit of colors at the end to help developers to find what’s what. The script generates the following SVG image (zoomed here)

pyavbp

This image showd how each module relies on the others. The mesh_utils is for example a clear low level helper unit, while avbp_setup is a mid-level module, merging calls from diverse front ends (orange, red, blue or purple boxes) and distributing them to lower-level units, all called “tools” (green boxes) in the local jargon.

There are however too many levels for a human to understand, and when exposed to the graph, the dev team search for potential simplifications. In this example, the developers wondered if this half-assedly named module generic_run could be merged into avbp_setup.

Like this post? Share on: TwitterFacebookEmail


Luís F. Pereira is an engineer that enjoys to develop science/engineering related software.
Antoine Dauptain is a research scientist focused on computer science and engineering topics for HPC.

Keep Reading


Published

Category

Pitch

Tags

Stay in Touch