Sourcetrail finally allows out of the box indexing of Python source code! Version 2019.2 adds Python support (beta) based on our open-source SourcetrailPythonIndexer. In this post I will write about the story behind adding Python support to Sourcetrail and how it will be useful for Python developers.
Sourcetrail in a Nutshell
Have you ever started working on a big codebase as a new developer where you had absolutely no idea how the code was structured? A couple years ago I was facing this situation, while I interned in the Google Chrome team. I spent most of my time on reading into their huge codebase in contrast to the little time that I used on actually changing the implementation.
Long story short: it turns out this sounds familiar to most developers and pretty much everyone of us encounters this situation when joining a new development team. At a smaller scale this happens to us all the time: Remember the last time you had to read the code written by a coworker of yours? Often documentation is scarce and original authors may not be around, so developers are left with diving into the source code. Here they mostly use the same tools as for writing code and have to piece together a mental model of the source code one search after the other. With this in mind I asked myself:
“Is this really the best possible approach to learn about an implementation? Why is there no tool that knows the whole implementation and lets me see everything I’m interested in at one glance?”
Well, that is exactly why we made Sourcetrail. If you are interested in the full story, please go ahead and read my post “Why working on Chrome made me develop a tool for reading source code”, so you have the whole picture.
Sourcetrail is a cross-platform source explorer designed to help software engineers navigate and understand unfamiliar source code. Using static analysis on the provided source code it extracts the relevant information from source code: such as which classes exist or where functions are called. Then it provides an all-in-one user interface for exploring this information. The interactive dependency graph uses a simple visual notation to give a quick overview of the relationships between symbols while making it super fast to jump through the code base, following calls and other dependencies. At the same time the code view shows all the implementation details for everything that is currently shown by the graph view. Both views are interactively linked to allow for easy navigation using either view.
To get a quick summary of the most important features, please watch our introduction video:
If you are interested in software visualization and want to know more about Sourcetrail’s unique dependency graph notation, then please watch my recent talk Software Visualization: The Humane Solution at ACCU 2019.
Adding Python Support
In late 2018 we released SourcetrailDB, an open-source library for creating and exporting Sourcetrail compatible projects, which enables developers to write new language support for Sourcetrail.
Right away we started working on SourcetrailPythonIndexer to add Python support to Sourcetrail, which was the most demanded language request on our public issue tracker. This really made sense to us, because Python does not force the developer to add type information. This makes it really hard to follow unfamiliar code and for example find all the scenarios which set a specific variable to a specific value.
Like many other Python tools our Python indexer uses Jedi to gather all the relevant information from the provided source code. However, Python is a dynamic language which makes it really hard for Jedi to resolve all the references within the provided source code. The image below illustrates a situation from our tictactoe_py sample (which is included in every Sourcetrail download), where Jedi was not able to tell which of the two functions
ArtificialPlayer.turn() actually gets called. Most of the time Jedi does a pretty good job, but since this is a showstopper when navigating unknown source code, we thought about ways to solve this problem.
For now we ended up adding a post-processing step after the indexer finished that will try to resolve unsolved references. Since Sourcetrail already stores the names of all symbols appearing in the whole codebase, it’s possible to infer which symbol might actually be used. References that have been resolved this way get marked
ambiguous and their edges in the graph are displayed dashed. Please keep in mind that this post-processing can also yield false positives and may show connections to wrong symbols with matching names.
At the moment this post-processing is implemented within Sourcetrail, but we are already working on moving it to SourcetrailPythonIndexer and have it respect more information to reduce false positives.
So what does Sourcetrail offer? For a quick teaser I will outline the most important features below, so you get an idea of what to expect. All the screenshots are taken from our
tictactoe_py sample project that comes with each download. If you prefer to try Sourcetrail on a larger project, feel free to download the pre-indexed mailpile sample.
Activate any class to get a visual overview of all its relationships to other symbols. All its members are displayed within the class node. Other symbols it depends on or ones that depend on it are displayed around it using our “Plus-Layout”: Base classes are at the top, derived classes at the bottom, referenced symbols on the right and depending symbols on the left.
Anytime you activate a symbol, the code view will show you its definition in the source code. This also works the other way round, so clicking a symbol in the code view shows the respective node with all connecting edges in the graph.
Member Function References
Sourcetrail’s dependency graph notation uses a simple color scheme: types are grey, functions are yellow and variables are blue. Activating a member function shows all its references to other symbols. The edges between nodes follow the same coloring rules: function calls are yellow and accesses to variables are displayed blue.
You can also activate edges with a click and the code view will show you the location of that reference within the code.
Clicking on a grey aggregation edge between two classes will display all dependencies between them. This is very useful when refactoring a certain class and trying to figure out how it is used by other ones.
Call & Inheritance Graph
By using the depth graph navigation in the top right of the graph view, you can display full class inheritance hierarchies and call graphs. This is especially useful when trying to understand the overall architecture of a certain feature or code path.
When activating a module, Sourcetrail will provide you a concise list of all its members. That way you can easily get an overview of a module and drill deeper by activating one of the displayed symbols.
How to setup a Python project
Setting up Sourcetrail for Python is quite easy: Just download and install our latest release. Next you need to create a new project by supplying name and location. Sourcetrail projects are organized into one or multiple Source Groups. In order to index some Python code you need to choose
Empty Python Source Group in the Source Group selection dialog.
Sourcetrail allows to specify a certain Python environment that is used when looking for dependencies, but you can also just skip this and Sourcetrail will use your default environment. Now you can specify which Python source files to index by adding either file paths or whole directories to Files & Directories to Index. If you need to, you can also exclude certain files from the list with Exclude Paths or specify different additional file endings in Source File Extensions.
After the Python Source Group is added to the project, you can just click Create in the lower right of the dialog and Sourcetrail will save the project file and show you the Start Indexing dialog.
Once the indexing is done, your project is ready to be explored. For more information on Sourcetrail’s project setup, please take a look at our documentation. Download the pre-indexed mailpile sample to explore a larger Python codebase.