Static analyzer debugging and quality assurance approaches

Maxim Menshikov 28.05.2020

St.Petersburg State University

About the author & the project

Maxim Menshikov —

- PhD student at St.Petersburg State University.
- Software engineer.
- (ex. security analyst; participated in commercial debugger project).

Equid<sup>1</sup> — a static analyzer for C/C++/RuC based on Model Checking and Abstract Interpretation. It verifies contracts and finds common defects.

<sup>&</sup>lt;sup>1</sup>Engine for performing queries on unified intermediate representations of program and domain models

What's special in analyzers?

Not much.

- Many equivalent transformations: input format ≠ intermediate format ≠ output format.
- Intermediate representations are mostly internal.
- The code is usually consistent and has high integrity, but there are logical mistakes, unprocessed parts → the biggest defects are logical.

# The problem

- There are many debugging & quality assurance methods.
- None of them are specialized enough for static analysis.
- Every project brings its own set of hardly formalized methods.

What if we find a right specialization of the methods to the static analysis field?

# The paper's goal



# Defect sources (observations)

- Missing support for the specific syntax/intermediate representation (IR) construction in submodules.
- Small differences in implementations for repeating parts (classes).
- Transformation and ordering issues.

## Defect reasons (observations)

- Low visuality of the transformation passes and the development process.
- Unattainable cross-dependencies between modules.
- Low quality of tests.

# **Proposed solutions**

Proposing the solutions of these three groups:

Code generation:

Generated code usage verification.

• Testing:

Goal-driven random test case generation.

• Logging:

Log fusion and visual representation.

# Code generation

- One model, several interpretations, many output source files.
- Perform a simple integrity check.



# Code generation: enumeration example



Goal-driven random test case generation

The idea is: generate input programs with an integrated verification goal (assertion).



Goal-driven random test case generation

- 1. The tool generates a random goal and asserts it  $\rightarrow$  an expression.
- The expression is repeatedly rolled into if/switch/for/while/... random blocks → a block.
- 3. The meaningful blocks are shuffled using equivalent transformations.

Goal-driven random test case generation

In result we get:

- 1. A completely random program.
- 2. A set of shuffled random programs.

By that, it is possible to verify:

- 1. Logical issues in transformations.
- 2. Ordering issues.
- 3. Runtime failures.

# Log fusion

Fuse separate logs, set up cross-references, so the final log is a technical documentation of the run. Allows for easy navigation.



# Log fusion: reverse recording assistance

The Log Fusion also helps break right after the specific log line using reversible debugger like RR<sup>2</sup>, UndoDB, etc. That is achieved using logging engine traps and GDB scripts.



<sup>2</sup>https://rr-project.org

# Visual representation: steps

Visualize steps — present all transformations in one window, allow to debug specific transformations.

| Log     | Performance Transformation                                                                                                        | Logs                                                                                                                                                                                                        |  |
|---------|-----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Source: | y = 3;                                                                                                                            | Resources<br>Containers                                                                                                                                                                                     |  |
| VM IR:  | assign (y:int res) = 3                                                                                                            | Name assign (Lnopint(fres) = 0 branch ((Lnopint(fres) = 1 assign (Lnopint(fres) = 1 assign (Lnopint(fres) = 1 assign (Lnopint(fres) = 2 end branch branch ((Lnopint(fres) = 1) branch ((Lnopint(fres) = 1)) |  |
| CVC4:   | (var_1_6 == 1    var_1_6 == 0 &&<br>var_1_2 == 5) => (var_1_4 = 3 &&<br>addrof(var_1_4) = var_345 &&<br>deref(var 345) = var 1 4) |                                                                                                                                                                                                             |  |
| AI:     | var_1_4 = [3, 3]                                                                                                                  | Fragments Stages                                                                                                                                                                                            |  |

# Visual representation: log health

Log health — visualize the time allocation for different modules. In result, it is possible to determine whether the specific part is unintentially skipped.

| aisolver         | 9%  | =    |
|------------------|-----|------|
| cvc4solver       | 86% | 8_   |
| cvc4solverType   | 9%  |      |
| debug            | 86% |      |
| diagmodel        | 9%  |      |
| executionmodel   | 0%  |      |
| flattening       | 0%  |      |
| fragmentvisitor  | 12% |      |
| hypervisor       | 0%  |      |
| multisolver      | 9%  |      |
| output           | 0%  |      |
| parsing          | 13% | ₩_₩= |
| resourcevisitor  | 12% |      |
| stpsolver        | 0%  |      |
| tagvisitor       | 11% |      |
| virtualprocessor | 0%  |      |

# Random test case generation: discovered issues & their severity

The method detected many performance, ordering, logical issues, and even runtime failures.

| Defect type     | Number of issues | Severity |
|-----------------|------------------|----------|
| Performance     | 3                | Medium   |
| Ordering        | 5                | High     |
| Runtime failure | 1                | High     |
| Logical issues  | 1                | Medium   |

# Log fusion: (rough) time to resolve the issues

#### Average improvement rate: 2.8.

| Defect type     | Time to resolve before (h) | Time to resolve after (h) |
|-----------------|----------------------------|---------------------------|
| Performance     | 25                         | 13                        |
| Ordering        | 5                          | 1                         |
| Runtime failure | 1                          | 0.3                       |
| Logical issues  | 1                          | 1                         |

Results: code generation and visual representation

The improvement is hard to examine.

In our experiment, developing the same feature twice took 7 times less time than on previous iteration - thanks to code generation.

Visual representation allowed to discover at least 2 performance issues, and overall provided an enormous help during defect resolution.

### Conclusion

- The specialization of the proposed methods helps find real issues in the static analyzer.
- The combination of approaches dramatically decreases the defect resolution time.