Development/Clang plugins

Clang Plugins
The Clang compiler can be extended using a plugin that can provide additional compile warnings and error, or do source code changes.

Using Clang plugin in LibreOffice
The sources for the Clang plugin are in compilerplugins/clang in the LibreOffice sources.

The Clang plugin is enabled either by --enable-dbgutil if Clang/LLVM development headers are found, or can be forced using --enable-compiler-plugins.

There are two kinds of actions a plugin can do:


 * providing additional functionality during compilation - Such as compiler warnings. These are enabled automatically if Clang plugin is enabled.


 * source rewriting - These can do modification to source files and are used to do transformations of source code. Such an action must be explicitly invoked by starting compilation and passing COMPILER_PLUGIN_TOOL= to make. See below for further details.

Selecting actions for Clang plugin
The Clang plugin will do automatically include any actions for which source files are present in compilerplugins/clang/ directory (but not subdirectories). Enabling/disabling actions can be done by simply by moving source files to or from the directory and rebuilding the plugin.

Source rewriting
Source to be modified should be first built normally. After that, invoke make again, but pass COMPILER_PLUGIN_TOOL= to make. The build will rebuild all relevant source files, but instead of generating code it will run source modifications. If the plugin does any changes, they will be done directly in the source files, so 'git diff' will show them, etc.

Note that it is possible to build only a selected part of LibreOffice source, either by invoking make in the selected module, or by passing a gbuild target to make (e.g. 'make Library_sw').

Developing Clang plugin
Create (or modify) files in the compilerplugins/clang/ directory to extend/modify functionality of the LibreOffice Clang plugin.

Clang API documentation
The Clang development documentation is at. Especially see Introduction to the Clang AST, Using RecursiveASTVisitor and Internals.

The Clang API documentation is available at. The most commonly used classes are those that have Decl, Expr or Stmt as their base class, see also Introduction to the Clang AST. It may also be convenient to view Clang classes and their documentation directly in the header files (mostly Decl*.h, Expr*.h and Stmt*.h files in the clang/include/AST/ directory).

It is also possible to find several tutorials on writing Clang plugins, either on Clang pages, or elsewhere. Note however that these mostly show how to create a plugin itself, which has been already done for LibreOffice. Developing a Clang plugin for LibreOffice means extending the one existing plugin, which requires using the Clang API.

Preparation
For example, add 'OUString foobar;' to any function in a source file, e.g. the first function in starmath/source/accessibility.cxx. When rebuilding the starmath module with the changed source file, there should be a warning about the unused variable. Remove the variable again.
 * Install Clang/LLVM, including development files (packages llvm-clang, llvm-clang-devel or similar) (on Fedora system the packages are named llvm and llvm-devel).
 * Build LibreOffice sources with Clang, remember to pass --enable-dbgutil --enable-compiler-plugins to configure.
 * Check that the plugin works properly.
 * LibreOffice build should be now prepared for using with the Clang plugin.

Part 1
This part will show how to add a warning whenever 'return false;' statement in encountered in a source file.

Compilers, when they process source code, build an internal representation of the source. Clang's representation, called AST, uses C++ classes for each node of the three. By examing the AST, the source code can be analysed.

Note: The AST represents the actual C++ code, after preprocessing. It is therefore not an exact representation of what is written in the source file, but of the actual C++ syntax, after all macros and other preprocessor directives have been executed. Clang's internal representation however includes information about the preprocessing step, so it is possible to find out if something is a result of a macro expansion or what the source file itself exactly looks like.


 * Go to compilerplugins/clang/store/tutorial and view file tutorial1_example.cxx.
 * Use -Xclang -ast-dump Clang options to have Clang dump its AST, to see how Clang sees the particular sources:

clang++ -fsyntax-only -Xclang -ast-dump tutorial1_example.cxx


 * You can see that for e.g. function g

bool g {                                                                                                                                                                                                                 return false; }

the AST is

bool g (CompoundStmt 0x314ae18   (ReturnStmt 0x314adf8  (CXXBoolLiteralExpr 0x314ade0  '_Bool' false)))

This means that function g is a compound statement (the enclosing {}), which contains a return statement, which as its return value expression has a bool literal with the value of false. Note that names of the nodes usually match names of the Clang C++ classes representing them. Warning about 'return false;' in sources therefore means finding ReturnStmt instance in the AST that contains a CXXBoolLiteralExpr.
 * Move tutorial1.cxx and tutorial1.hxx to compilerplugins/clang/.
 * Build Clang plugin with the Tutorial1 action. At the toplevel, run:

make compilerplugins


 * Now compile sources, for example the sot module (cleaning it first, so that it is actually rebuilt).

make sot.clean make sot

This will show warnings such as:

/home/llunak/build/src/l1/sot/source/sdstor/stgstrms.hxx:93:47: warning: returning false [loplugin] virtual bool IsSmallStrm const { return false; } ~^~


 * View tutorial1.hxx header file:

There is class Tutorial1 declared, which performs the check. This class inherits from RecursiveASTVisitor (Clang class for traversing the AST) and Plugin (base class for LibreOffice plugin actions).


 * View tutorial1.cxx source file:


 * At the top, there is a comment saying whether it's a compile check or a rewriter and a description of what it does. All source files should have such description.
 * Tutorial1::run is the function that will be called to perform the action. It calls TraverseDecl from the RecursiveASTVisitor class, which will traverse the entire AST for the source file being compiled, and will call callbacks about the structure. In this case, it will call Tutorial1::VisitReturnStmt (such functions must be of the form VisitXXX( XXX* ), where XXX is the C++ class for which a callback is wanted).
 * The function will check if it is a return statement of the expected form (containing CXXBoolLiteralExpr), if yes, it will print a warning using the LO report helper function for Clang diagnostics.
 * First argument is the type of report (warning, error, fatal error, note). A note is an additional information for a previous message (e.g. showing where a variable involved in the message is declared).
 * Second argument is the message. It can contain also placeholders, such as %, or even selecting one item (e.g. '%select{if|while|for}0').
 * Third optional argument, the location for the message. Clang points to exact location in its messages, so this should point to the AST node where the problem is (use getLocStart for the location).
 * Additionally, using operator << additional data may be provided for the message, such as a source range to highlight a whole expression (use getSourceRange on a node), or values for placeholders.


 * Finally, each Plugin instance must register itself with the LO plugin handler using Plugin::Registration. The string is the name of the action, for invoking with COMPILER_PLUGIN_TOOL=.

That's it. Remove the tutorial1.cxx/hxx files to disable the warning again.

Part 2
Finding each 'return false;' in the source can be easily done even using the grep command. This part will extend the example to warn about such return statements only if they are the body of an if statement that has equality comparison as the condition. Since C++ source does not have strict formatting, this would be much harder to do using grep (and probably impossible to do reliably, especially when considering macros).


 * Go to compilerplugins/clang/store/tutorial, view file tutorial2_example.cxx and view its AST representation:

clang++ -fsyntax-only -Xclang -ast-dump tutorial2_example.cxx

You can see that e.g.

if( 1 == 2 ) {                                                                                                                                                                                                             return false; }

can be represented as

(IfStmt 0x43eae40   (<<>>)   (BinaryOperator 0x43eadc0  '_Bool' '==' (IntegerLiteral 0x43ead80  'int' 1) (IntegerLiteral 0x43eada0  'int' 2))  (CompoundStmt 0x43eae20  (ReturnStmt 0x43eae00       (CXXBoolLiteralExpr 0x43eade8  '_Bool' false)))   (<<>>))

The IfStmt nodes has several subnodes, the first <<>> is a variable declaration inside the condition, which is not used in this case, the second <<>> is an else part, which is not present either. Also note that CompoundStmt is not present if the body is not inside {}. Finding such source code will mean searching the AST for IfStmt and checking its subnodes if they are BinaryOperator, ReturnStmt and CXXBoolLIteralExpr.
 * Move tutorial2.cxx and tutorial2.hxx to compilerplugins/clang/ and again rebuild the plugin and some source (for example the sot module):

make compilerplugins sot.clean sot

This will again show warnings, but this time only few of them, and they will also show the if statement:

/home/llunak/build/src/l1/sot/source/sdstor/stgstrms.cxx:501:20: warning: returning false after if with equality comparison [loplugin] return false; ~^~ /home/llunak/build/src/l1/sot/source/sdstor/stgstrms.cxx:500:9: note: the if statement is here [loplugin] if( nBgn == STG_EOF ) ^


 * View tutorial2.hxx header file:

This is very similar to Tutorial1 class, except for this time IfStmt nodes will be visited.


 * View tutorial2.cxx source file:

That's it. Remove the tutorial2.cxx/hxx files to disable the warning again.
 * Again, a comment describing the plugin action.
 * Since this time if statements are checked, the callback will be Tutorial2::VisitIfStmt. There does not seem to be a way to go up in the Clang AST, so it is necessary to start as high in the tree as necessary, and go down.
 * The function again checks for the wanted AST nodes structure and warns if it is found. Note that the function handles both the case with and without a compound statement, since although they are in this case technically the same, the source code and thus also AST representation is different.
 * Compound statements can contain any number of statements, the CompoundStmt class provides an iterator to check them.

Part 3
Previous part has shown how to analyse the AST. This part will further extend the example to modify the found return statement from 'return false;' to 'return 'maybereturntrue'. Doing such a change over a large number of files manually would be tedious, error-prone and take a long time.


 * Move tutorial3.cxx and tutorial3.hxx to compilerplugins/clang/ and rebuild the plugin:


 * First normally build the source that will be later modified (this is because source transformations using the Clang plugin do not actually compile the source and there may be build errors because of the object files that would not be generated).


 * Now build the source again, this time passing to make the name of the action to run in COMPILER_PLUGIN_TOOL and FORCE_COMPILE_ALL to ensure it is run on all source files, even those up to date:


 * After the run finishes, the sources have been modified. See them e.g. using 'git diff':


 * By default only .c/.cxx files are modified. It is possible to pass UPDATE_FILES= to make to specify which files should be modified:
 * mainfile - only the .c/.cxx file is modified (default)
 * all - all files are modified, including header files in other modules
 * - module name (LO toplevel directory), only files there will be modified
 * Treat the changes as if you have made them manually (i.e. verify them, further modify if necessary, etc.).


 * View tutorial3.hxx header file:

This is very similar to Tutorial2, but this time the base class is RewritePlugin, the LO base class for plugins that rewrite sources.


 * View tutorial3.cxx source file:


 * Again, a comment describing the plugin action, this time also saying that it is a rewriter.
 * There is no warning needed, but the function instead modifies the source. This is done using helper functions as such replaceText.
 * The first argument is a source range to be replaced, in this case. Passing boolliteral->getSourceRange gives the whole source range where the 'false' literal is.
 * The second argument is the new text to be placed at the given source range.

Clang internally will adjust to such modifications, which means it will still give proper original source locations, even if such a modification might have moved them. It is also possible to do modifications repeatedly.

Note that it is not possible to do such modifications if the source to be modified is a result of a macro expansion (since, at the place where the actual code is, there is just the macro invocation, the code will be there only after preprocessing). The LO helper functions will warn when such a modification is not possible.


 * Last in the source file is again registration using Plugin::Registration. The "tutorial3" name passed to is was the name passed to COMPILER_PLUGIN_TOOL when invoking make.

That's it. Remove the tutorial3.cxx/hxx files to disable the rewriter, and also revert the changes that have been done by running it.

The directory compilerplugins/clang contains a number of plugin actions that can be used as a base for developing new actions. A number of source files are also under the store/ subdirectory, there are not used, but are kept either as examples, or they can be moved to the clang directory to be enabled again.

Slides
&ldquo;Plug Yourself In: Learn how to write a Clang compiler plugin&rdquo; has the slides of a talk on the subject from LibreOffice Conference 2015.