Sourcetrail 2018.4 features big improvements on indexing performance and reduced memory consumption. The new tab bar in the application window allows for opening multiple symbols simultaneously, just like in a web browser. Also new type-use edges were added to the graph visual, to make exploration of types easier when templates/generics are involved.
Sourcetrail Slack Channel
Firstly, we want to announce that we just started testing a public Sourcetrail Slack channel. It will help you to get in touch with both us and other users. We also hope to find out more about how Sourcetrail is used, to discuss new features and well… how the channel will really be used, will largely depend on the users who join.
So feel free to join:
We plan to test the channel for a few months and then evaluate whether it makes sense to keep it or not. Please be aware that we are located in Salzburg, Austria (GMT+1, no kangaroos) regarding timezone.
New in this Release:
- Faster indexing: Up to 66% speed-up! (depending on your hardware)
- Lower memory consumption: During indexing and project loading
- Tab bar: At the top of the main window
- C++/Java: More type-use edges for easy templates/generics navigation
- C/C++: Updated to LLVM/Clang 7.0.0
- News Box: Located on the start screen
Faster Indexing and Lower Memory Consumption
We spent most of our time since our first official Sourcetrail release in summer 2017 with improving usability, adding new UI features and extending project setup. That work was well received and we see a growing number of users.
However, many users started complaining that our indexing speed is pretty bad. Far worse than that of other static analysis tools. Therefore we started looking into this problem over the last months and we are happy about the speed-up we achieved. We released first improvements with the maintenance version 2018.3.55. With this new release we managed to get about the same speed-up on top of that!
I also want to explain a little bit what we did and give you some rough numbers. I will mainly focus on our C/C++ indexer here, but most improvements also apply to the Java indexer.
Clang LibTooling AST building
As you can see in the chart above, the largest part of indexing time is spent on Clang LibTooling AST building. This is the step where all parsing, tokenization and the assembly of the Abstract Syntax Tree (AST) is done by Clang LibTooling. For you this should be close to the time that your compiler takes to build your project. If your old Sourcetrail version took much longer to process your project than your compiler, chances are that your blue bars are much larger and the improvement is even more significant for you. However, the time it takes Clang LibTooling to build the AST is somewhat out of our control.
But this is the place where your project configuration matters. As a user you can optimize this step by reducing the number of Header Search Paths, reordering them or removing unnecessary flags. This can have a big influence on indexing time.
AST traversal and recording
In this step we visit parts of the AST and record all symbols, references and source locations we need for exploring C++ source code. This part of our codebase was in constant development for the last 4 years. Thus some things, like caching, were done in multiple places. We did not use the most efficient C++ syntax yet. By refactoring and modernizing this code we could achieve big performance improvements. These changes include:
- Improved caching of symbol names and file paths.
- Used integral identifiers to reference file paths and symbols.
- Reduced number of copies and allocations when passing data.
- Reduced accesses to the file system.
Most of these changes were easy to implement. Some needed a little refactoring. The hardest part was figuring out how data was recorded and passed through our application, which was not too hard because… you know, we have Sourcetrail to figure that out 😉.
The chart above shows how much we improved on AST traversal and data recording over the last versions.
The steps discussed above, AST building and traversal, are done within indexer processes/threads that run in parallel. They are running autonomously and do not influence each other, so none of them ever has to wait for any other.
But when an indexer is done with a translation unit, it passes all the recorded data back to the main process for storing into our Sqlite database. Because merging of data records from all indexers runs in a single thread, it is a well known bottleneck. Especially when running on a high-end CPU with 12+ parallel indexers, the indexers may produce data way faster than what the database synchronization can keep up with. If too much memory piles up, then the indexers have to stop and wait. For that reason one of our main concerns was improving database insertion speed.
The chart above shows how we improved insertion times over the last versions. If you have a lot of CPU cores and database insertion was the main bottleneck on your machine, then chances are you see an overall indexing speed improvement according to this chart.
Over the months there have been lots of requests regarding multi window or tabs support to allow for looking at multiple symbols or even projects simultaneously. We made the first step with our new tab bar, located at the top of the window.
The user interaction and shortcuts are the same as you are used to from your web browser. With the only exception that tabs cannot be detached into separated windows yet. We also added Open in New Tab context menu actions to the graph and code view. Alternatively the middle mouse button can be used for that as well.
Type-use edges to template/generics argument type from parent context
Sourcetrail makes it possible to inspect how template/generics types are composed, by showing you all types that are used as template/generics argument. But sometimes it can be cumbersome to see which types really depend on which other types.
Let’s take a look at an example:
class Object; template <typename T> class SharedPointer; SharedPointer<Object> object;
In Sourcetrail we display this type relationship as seen in the image below. The variable
object is of type
SharedPointer<Object> which is based on the template type
SharedPointer and uses the type
Object as template argument type.
This looks fine the way it is. But in case you are looking at the variable
object or at the class
Object individually, the dependency between them is hidden. If you look at
object, you don’t easily see that it is using the type
Object from the graph visual. When looking at the class
Object, you don’t see right away that there is a variable
object using this type.
With this new release we added type-use edges from the parent context to the type used as template/generic argument. That way it is easier to spot how types are used in your codebase when templates/generics are involved.
In combination with our node bundling, this also has a nice side effect. If we have an expression in C++ of the form
std::vector<std::shared_ptr<Object>> objects, then it is very easy to see that this is a container of
Object instances now, with all the standard library classes being hidden away in the Non-Indexed Symbols bundle.