Photo Debby Ledet on Unsplash. The growth of software codebases often comes with an increase of internal ramifications. Unattended, these ramifications slow down the onboarding of new contributors. Here we present a solution for a python codebase focused on Imports graphs.
Imports graphs allow to identify how modules are coupled and understand where the external dependencies are really required. Yet, we can’t find any out of the box tool that satisfies all our needs.
The example script
We use findimports
and graphviz
to create a flexible script that generates an imports graph. Our contribution lies mostly on the filtering of (unadequate) information (stdlib-list
is of great help here).
Fetch the information
The main function is graphpack()
.
With findimports
we can parse our codebase to get all imports:
from findimports import ModuleGraph
g = ModuleGraph()
g.external_dependencies = True
g.parsePathname(package_path)
Clean the graph
The raw import graph is often too complex to be understandable. We remove here :
- the internal modules specified in
ignore_modules
, like a command line module. - the external packages specified in
ignore_packages
like the numpy library and all the standard library. - the
__inits__()
that will provide no information.
then we create a graph with the library graphviz
.
The actual code
The actual code of the script is as wfollows
import sys
import fnmatch
from stdlib_list import stdlib_list
import graphviz
from findimports import ModuleGraph
def graphpack(
path,
package_name,
ignore_packages=None,
ignore_modules=None,
color_rules= None
):
"""Plot a graph from the imports of your package
Args:
package_name: str, you package name
ignore_packages list of string : names of packages to ignore
ignore_modules list of string : names of modules to ignore
color_rules: dict of colors names to customize the graph:
color_rules = {"foo": "purple", "bar": "blue"}
will paint all names containing foo in purple and bar in blue.
if several matches, the last teakes precendence.
Returns:
writes down a {package_name}.svg image.
"""
if ignore_modules is None:
ignore_modules=[]
if ignore_packages is None:
ignore_modules=[]
modgraph = ModuleGraph()
modgraph.external_dependencies = True
modgraph.parsePathname(path+"/"+package_name)
#modgraph.printImports()
_del_modules(modgraph,ignore_modules)
_del_inits(modgraph)
internal_modules = set(module.label for module in modgraph.listModules())
_del_external_packages(modgraph,internal_modules,ignore_packages)
_create_graph(modgraph, internal_modules,package_name, color_rules)
def _del_modules(modgraph,ignore_modules):
"""Remove the ingorme module list"""
to_delete = []
for module_name in ignore_modules:
for act_module in modgraph.modules:
if fnmatch.fnmatch(act_module, module_name):
to_delete.append(act_module)
for del_mod in to_delete:
print(f"skipping module: {del_mod}")
del modgraph.modules[del_mod]
def _del_inits(modgraph):
"""Remove the __inits__"""
for module in modgraph.listModules():
# Simplify labes of a module if __init__ in inside
if module.label.split('.')[-1] == '__init__':
module.label = '.'.join(module.label.split('.')[:-1])
# enleve les imports de ce module termines par INIT et les remplace par le nom au dessus
for import_ in module.imports.copy():
if import_.split('.')[-1] == '__init__':
module.imports.remove(import_)
module.imports.add('.'.join(import_.split('.')[:-1]))
def _del_external_packages(modgraph, internal_modules, ignore_packages):
"""Remove stdlib packages and user defined ignore-packages list"""
stdlib = set(stdlib_list(sys.version[0:3]))
for module in modgraph.listModules():
for import_ in module.imports.copy():
# do not accept standard library
if import_ in stdlib:
module.imports.remove(import_)
# simplify external dependencies
elif import_ not in internal_modules:
module.imports.remove(import_)
import_ = import_.split('.')[0]
if import_ not in ignore_packages:
module.imports.add(import_)
def _create_graph(modgraph, internal_modules,filename, color_rules):
"""Create the graph with graphiz"""
node_names = set()
for module in modgraph.listModules():
node_names.add(module.label)
node_names |= module.imports
if color_rules is None:
color_rules = {}
dot = graphviz.Digraph(filename, engine="dot")
# create nodes
for name in node_names:
style = "" if name in internal_modules else 'dashed'
color="black"
for key in color_rules:
if key in name:
style="filled"
color = color_rules[key]
dot.node(name, style=style, color=color , shape='record')
# add edges
for module in modgraph.listModules():
for import_ in module.imports:
dot.edge(module.label, import_)
dot.render(filename, format='svg', cleanup=True, view=True)
Application to an actual package
We will apply this function to a codebase called pyavbp
.
Note that the number of information filtered out is quite high :
graphpack(
path="/Users/dauptain/GITLAB/",
package_name="pyavbp",
ignore_packages = [
"nob",
"ms_thermo",
"opentea",
"numpy",
"h5py",
"yaml",
"pkg_resources",
"scipy",
"h5cross",
"arnica",
"f90nml",
"directory_tree",
"pyavbp"
],
ignore_modules = [
"pyavbp.cli",
# "pyavbp.tools.main_makeinject",
"*test*",
# "*injector*",
"*INPUT*",
# "pyavbp.io.mesh_utils",
],
color_rules= {
"avbtp": "purple",
"tavbp": "blue",
"avtp": "red",
"combu": "orange",
"tools": "green",
"postproc": "green",
"mesh2curve": "green",
}
)
We add a little bit of colors at the end to help developers to find what’s what. The script generates the following SVG image (zoomed here)
This image showd how each module relies on the others. The mesh_utils
is for example a clear low level helper unit, while avbp_setup
is a mid-level module, merging calls from diverse front ends (orange, red, blue or purple boxes) and distributing them to lower-level units, all called “tools” (green boxes) in the local jargon.
There are however too many levels for a human to understand, and when exposed to the graph, the dev team search for potential simplifications.
In this example, the developers wondered if this half-assedly named module generic_run
could be merged into avbp_setup
.