Booster Static Analysis
I’ve been spending a lot of time lately optimizing booster-task-analyser – both refining features and improving performance. Previously, static analysis was handled by booster-transform-lint. Although that module was open-sourced early on, I was never satisfied with the analysis results. Factoring in other concerns, I decided to rewrite it from scratch – hence booster-task-analyser, designed to replace booster-transform-lint.
The redesign was driven by several considerations:
- Static analysis isn’t run as frequently as builds, so a Task is a better fit than a Transformer;
- CHA (Class Hierarchy Analysis) requires all class information upfront, while a Transformer processes classes in a pipeline – not ideal;
- Static analysis can be slow; as a Transformer, it would severely impact build performance, and the build doesn’t depend on analysis output anyway;
So booster-task-analyser is built as a Task:
Classpath
Before performing static analysis, we need all the classes to analyze. These come from two sources:
System classes
The APIs provided by the Android SDK – specifically, ${ANDROID_HOME}/platforms/android-xx/android.jar. During the build, these can be obtained via
BaseExtension.getBootClasspath().APP classes
App classes are trickier. There are two approaches:
- Analyze project dependencies to discover them;
- Obtain them through Transform;
The current implementation uses option 2 – relying on Transform – mainly because it’s easier to implement. Taking a shortcut here :P
I expected loading all classes (parsing class files into ASM
ClassNodeobjects) to be time-consuming, but nearly 100,000 classes loaded in about 15 seconds. Machine specs:
- Processor: 2.5GHz Intel Core i7
- Memory: 16GB 2400 MHz DDR4
- Storage: SSD
Entry Point
Every static analysis needs entry points. For a regular program, it’s typically the main method. For Android apps, the entry points are Application, the four major components, XML layouts, and so on. The first task is finding them all.
The Four Major Components
Application and the four major components are declared in AndroidManifest.xml. The merged manifest can be obtained via mergedManifests.
Custom Views
The most direct way to find custom Views is to parse Layout XML files. These can be obtained through mergeRes, though they come as AAPT2 artifacts – flat files. This is why the booster-aapt2 module exists.
Through testing, I found that parsing flat files is actually slower than parsing raw XML source files. So the final implementation only parses the flat file header to locate the source file path.
Thread-Annotated Methods and Classes
Android provides Thread Annotations to help compilers and static analysis tools improve code checking accuracy. Any class or method annotated with Thread Annotations can be treated as a thread entry point.
Since popular application frameworks also have thread annotations, Analyser supports Event Bus as well – methods annotated with @Subscribe(threadMode = MAIN) are identified as main thread entry methods.
Reflection
In Android apps, reflection is sometimes used to break reference chains between Application and other classes – for hot-fix support or to keep the Main Dex from growing too large. See: Working as an Architect at DiDi (Part 1). How do we connect reflective calls to their actual target methods? Stack frame analysis. That’s a topic for another day.
Main Looper
In Android, the main thread’s Looper and Handler also qualify as Entry Points. Determining whether a Handler or Looper runs on the main thread is genuinely challenging. For example:
1 | public class HomeActivity extends Activit { |
The Call Graph looks like this:
"HomeActivity.onResume()" [color="#33330000",fillcolor="#F18F01"];
"Activity.onResume()" [color="#33666600",fillcolor="#006E90"];
"Helper.handleEvent(Handler)" [color="#33666600",fillcolor="#99C24D"];
"Handler.post(Runnable)" [color="#33666600",fillcolor="#B5248F"];
"HomeActivity.onResume()" -> "Activity.onResume()" [color="#555555"];
"HomeActivity.onResume()" -> "Helper.handleEvent(Handler)" [color="#555555"];
"Helper.handleEvent(Handler)" -> "Handler.post(Runnable)" [color="#555555"];
}
In this example, Helper.handleEvent(Handler) calls a Handler, but that Handler comes from another method. To trace the handler parameter back to its origin, we need to analyze the Call Graph in reverse:
- Find the calling node(s) above the current one;
- Locate the INVOKE instruction corresponding to that node;
- Walk back from the instruction’s stack frame to determine the source of handler;
- handler comes from a
GETFIELDinstruction – it’s a member ofHomeActivity; - Search all methods for SETFIELD instructions that assign to handler, ultimately determining it was initialized in the constructor;
This shows how critical the Call Graph is for static analysis. This example is relatively simple; if the Handler is passed as a parameter through multiple methods, the algorithmic complexity increases significantly.
ClassSet
ClassSet serves several purposes:
- Fast lookup of
ClassNodeby class name; - Fast lookup of which classpath (directory/JAR) a class belongs to;
For performance and reusability, ClassSet is designed as follows:
Why does CompositeClassSet exist? Because when Analyser analyzes an application, it needs access to these ClassSets:
- A ClassSet containing only System classes
For quickly finding Application and the four major components.
- A ClassSet containing only App classes
For building the application’s Call Graph.
- A ClassSet containing all classes – the union of System classes and App classes
For Class Hierarchy Analysis (CHA).
To ensure lookup efficiency, all ClassSet implementations use caching.
Class Hierarchy Analysis
Class hierarchy analysis is critical for static analysis – it determines both the accuracy and completeness of results. Previously, CHA was implemented using ClassLoader, which was relatively straightforward. See: KlassPool & Klass. The main goal was determining whether two types have an inheritance relationship. Why abandon the old approach and redesign CHA? Several reasons:
- When ClassLoader loads a Class, it verifies the bytecode even without initialization, potentially throwing VerifyError and failing the entire analysis;
- Performance – ClassLoader‘s class loading performance is far worse than ASM;
- Beyond inheritance, we also need to analyze fields, methods, and annotations – the information available through Class reflection is limited;
There’s another crucial requirement during class hierarchy analysis – polymorphism detection. For example:
1 | public interface ProfileView { |
If we represent ProfileActivity.onResume() as a Call Graph:
"ProfileActivity.onResume()" [color="#33330000",fillcolor="#F18F01"];
"Activity.onResume()" [color="#33666600",fillcolor="#006E90"];
"ProfilePresenter.refreshUserInfo()" [color="#33666600",fillcolor="#99C24D"];
"ProfilePresenterImpl.refreshUserInfo()" [color="#33666600",fillcolor="#CCCCCC",fontcolor="#999999"];
"UserService.getUerInfo(Callback)" [color="#33666600",fillcolor="#CCCCCC",fontcolor="#999999"];
"ProfileActivity.onResume()" -> "Activity.onResume()" [color="#555555"];
"ProfileActivity.onResume()" -> "ProfilePresenter.refreshUserInfo()" [color="#555555"];
"ProfilePresenter.refreshUserInfo()" -> "ProfilePresenterImpl.refreshUserInfo()" [color="#555555",style="dotted"];
"ProfilePresenterImpl.refreshUserInfo()" -> "UserService.getUerInfo(Callback)" [color="#555555"];
}
Here’s the problem: in the Call Graph above, ProfileActivity.onResume() cannot reach ProfilePresenterImpl.updateUserInfo(), because the reference type calling updateUserInfo() in ProfileActivity is ProfilePresenter, not ProfilePresenterImpl. Each node in the Call Graph needs an attribute indicating whether it’s a virtual method. If it is, the virtual method must be linked to all override methods in derived classes. This gives rise to several requirements:
- Fast lookup of derived classes by class name;
- Fast lookup of subclasses that override a specific method, given a class name and method name;
Summary
As the examples above illustrate, static analysis involves quite complex data structures and algorithms, including:
- Link List
- Map
- Set
- Digraph
- Binary Tree
- ……
At sufficient scale, performance becomes a major challenge – a single line of code can make a dramatic difference.
When the first version of Analyser was completed, analyzing 100,000 classes took over half an hour. After continuous optimization, it now consistently finishes in about 1 minute with peak memory around 3GB. The optimization techniques involved are yet another topic for another day.
- Blog Link: https://johnsonlee.io/2020/03/21/booster-task-analyser.en/
- Copyright Declaration: 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
