LLVM-Toolset

This topic was published by and viewed 1622 times since "". The last page revision was "".

Viewing 1 post (of 1 total)
  • Author
    Posts

  • DevynCJohnson
    Keymaster
    • Topics - 437
    • @devyncjohnson

    Just as the GNU Toolchain and Binutils contains many helpful developmental tools, LLVM (Low Level Virtual Machine) also has a special set of tools. The LLVM toolset is a comparable alternative to the GNU Toolchain. Some developers prefer LLVM's tools over GNU's utilities, so it may be wise to learn about LLVM's commands.

    NOTE: This is the same LLVM that is commonly used with Clang.

    Before we discuss LLVM's tools, we should understand some topics about LLVM. LLVM is commonly used as a backend for Clang. LLVM itself has many frontends, meaning many compilers use LLVM as their backend (usually optionally). LLVM gets the parsed source code from a compiler (like Clang) and converts the code to some form of LLVM code. LLVM code may be LLVM Bitcode (*.bc) or LLVM Assembly (*.ll); both of which are called LLVM Intermediate Representation (IR). Additionally, LLVM Intermediate Language is another name for LLVM Assembly. Both of these forms are equivalent and can easily be converted from one to another without ruining the code. Command-line usage may look like "clang -S -emit-llvm CODE.c -o CODE.ll" (yes, the single hyphen/dash works with "-emit-llvm") which converts C to LLVM Assembly. Once in LLVM Assembly, many manipulations can be performed to the low-level code before compiling (or cross-compiling) to machine code or converting to another language (if supported). Or, the bitcode can be executed by the LLVM Interpreter just as Java VMs execute Java bytecode.

    NOTE: Bytecode and Bitcode are similar, but they are not quite the same. There are many differences, but the primary and easiest explanation is that bitcode is a stream of bits and bytecode is a stream of bytes that have a fixed-length. Remember, one byte is 8 bits.

    Now, we can discuss the tools. The LLVM Assembler (llvm-as) is the program that converts LLVM Assembly to LLVM Bitcode. LLVM Assembly files have the *.ll file-extension and LLVM Bitcode files use *.bc. To use the command, type something like "llvm-as prog.ll -o prog.bc". Alternately, typing "llvm-as -" implies that the assembly code will be coming in through the standard input (stdin). With such a command, the bitcode produced will come from stdout. If an input file is specified, then the output will be a file with the same name as the input file, but with a ".bc" file-extension.

    The LLVM Disassembler (llvm-dis) is just like llvm-as except bitcode is converted to assembly.

    The optimizer (opt) optimizes or analyzes LLVM code (both assembly and bitcode). For optimizations, the output is a bitcode file with optimizations. To perform an analysis, use the "--analyze" parameter. The generated output typically appears on stdout and contains the results of the tests. Like, llvm-as and llvm-dis, an output file can be specified (via "-o") and the input can come from stdin when using "-". When optimizing, many parameters are available for use. "-S" produces LLVM Assembly output rather than bitcode.

    Example (opt): opt -S slow-code.ll -o faster.ll

    NOTE: For all of the tools in this article, many other parameters exist.

    The LLVM Static Compiler (llc) compiles IR to the type of assembly that CPUs read (*.s) or native object code (*.o). This compiler accepts and creates input and output files the same way as the previously mentioned commands. The default output consists of the results from testing. However, the output code can be created and specified with the "--filetype" parameter. Use "--filetype=obj" for object files and "--filetype=asm" for assembly files.

    • Example (llc - assembly): llc input-file.ll --march=x86-64 --filetype=asm -o output-file.s
    • Example (llc - object file): llc input-file.ll --march=x86-64 --filetype=obj -o output-file.o

    The LLVM Interpreter (lli) is an interpreter for LLVM Bitcode. With the "-force-interpreter false" parameter, the LLVM Just-In-Time (JIT) compiler will be used instead of the interpreter. Otherwise, "true" will make the interpreter run the code. By default, the JIT compiler executes the code. Users have a lot of control over the interpreter and JIT compiler by using the many parameters.

    The LLVM Linker (llvm-link) links the bitcode files together into one bitcode file. With the "-S" parameter, the bitcode files will become a single LLVM Assembly file.

    Just as the archiver in the GNU Toolchain archives multiple library files into one library archive, the LLVM Archiver (llvm-ar) archives bitcode into a single LLVM library archive. However, any file-type can be included in an archive. During compile-time, the archive files need to be included or linked when creating the final product (unless the archive is the desired product).

    To list the symbol names of various LLVM-related files, use the LLVM Name utility (llvm-nm). This command demangles symbols found in the file whether that file be LLVM Bitcode, an object file, or an archive containing such files. If the mangled symbols are C++, then llvm-nm cannot demangle them, unlike GNU's nm utility.

    The llvm-config command outputs flags, info, etc. related the given parameter. For instance, typing "llvm-config --cxxflags" would output the C++ compiler flags needed to use LLVM headers. In the command-line, when typing the parameters for a compiler, typing "g++ `llvm-config --cxxflags` -o project.o -c project.cpp" with the llvm-config command among the parameters adds the needed flags for adding LLVM headers. Notice the grave accents (curved/angled/sharp single quotes) around the llvm-config command. The grave accents are special shell characters that indicate that the enclosed command is to be executed first and the output is set in the command's place. For illustration, line one would become line two, as seen below. However, some systems will have different outputs.

    1. g++ `llvm-config --cxxflags` -o MathJIT.o -c MathJIT.cpp

    2. g++ -I/usr/lib/llvm-3.4/include -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -g -O2 -fomit-frame-pointer -fvisibility-inlines-hidden -fno-exceptions -fPIC -Woverloaded-virtual -Wcast-qual -o MathJIT.o -c MathJIT.cpp

    "llvm-diff" is a special diff utility for LLVM code. Two files are compared and differences are reported. However, minor and insignificant changes are disregarded. Assume a programmer switches two lines; one states that X equals "3" and the other states that Y equals "7". Such a difference does not change the way the program/code behaves. Only changes/differences that influence the work-flow are reported up to the first instance. This means llvm-diff stops searching for differences when the first relevant variation is found.

    To collect code coverage data (definition explained in a moment), use the llvm-cov tool on functional *.gcno files. This means that LLVM code to be tested needs to be compiled with extra parameters that add coverage-collection code. To produce this compiled form of the code, add the "--coverage" flag to the compiling process. Then, type "llvm-cov FILE" in a command-line and name the file to be tested. After running the command, *.gcda files are produced for each generated object file (the *.gcno files) and placed in the same directory. Next, run llvm-cov for each source file that was used to make the *.gcno files, but run the command in the same directory as the *.gcno and *.gcda files. A *.gcov file is written for each source code file tested and all files that are included/imported in the source files.

    However, what are "code coverage tests"? Such tests watch the executing code. The coverage-tester keeps track of the activity of each part of code. The tester also notices which boolean sub-expressions (individual conditions in IF statements) evaluate to true or false. The tester checks to see if all functions as classes are used/called. Each coverage tester is different and watches certain activities. After execution, the tester reports how often certain parts of the code are used/executed. Any code that never executes is reported. However, this does not mean such code should be removed. Instead, the developer should view the code and decide if the unused code is needed. Perhaps, the code never executed because certain conditions were never met. For instance, maybe an error-cleanup function is not executed because errors never occurred. Thus, the programmer should view unused code and decide if it is needed and why the code was not used. In instances like this, developers can use "error injectors" to trigger a fake error so the program will use such code. Running coverage-testers rules out the possibility of many simple software bugs. Nonetheless, some bugs may persist in the code.

    "llvm-stress" is a tool that generates *.ll files for the purpose of testing LLVM itself.

    To find the location in the source code of a symbol with only the address known, use the LLVM Symbolizer (llvm-symbolizer). The llvm-symbolizer command only accepts addresses via stdin and returns the filename, line number, and object pertaining to the address. For instance, in a plain text-file, place the address or addresses each on their own line. Then type "llvm-symbolizer --obj=SRC-FILE.cpp < addresses.txt". The addresses will be read from "addresses.txt" and llvm-symbolizer will trace the addresses in the specified source-code file "--obj=SRC-FILE.cpp". Afterwards, the output typically contains an object name, line number, and filename for each given address. Alternately, users can list multiple files in the text file with the addresses. To do so, list the file's name and location (absolute or relative) on the same line as the address like below and run a command like this - "llvm-symbolizer < ADDRESS.txt".

    CWD-FILE.so 0xADDRESS
    /path/to/app.so 0xADDRESS
    /another/path/a.out 0xADDRESS

    "llvm-dwarfdump" is used to parse and print the DWARF sections in object files.

    To manage profile data, use the "llvm-profdata" command.

    Objects can be removed from LLVM Bitcode via the "llvm-extract" command. To remove functions, type "llvm-extract --func NAME FILENAME" plus any needed parameters to remove the named function from the given file. Any global objects that are no longer defined due to the removal will also be taken out. By replacing "--func" with "--rfunc" in the command-line, programmers can specify a function or multiple functions via regex. To remove global variables, use the "--glob" parameter to remove one, or use "--rglob" to remove many using regex. With any of the mentioned parameters, they may be used multiple times and in combination. This tool is useful when developers need to extract functions or global variables for debugging purposes. In addition to removal, the tool sends the removed code to stdout. With the "-S" parameter, the removed bitcode will be changed to LLVM Assembly in the output. With the "-o FILE" parameter, the output is written to the specified file.

    LLVM Bitcode Analyzer (llvm-bcanalyzer) reads bitcode and outputs various information concerning the code. With the "--dump" parameter, the code is converted to a human-readable form and sent to stdout with some information about the code. It is important to know that the converted code that is sent out is not the same as LLVM Assembly.

    LLVM Integrated Tester (lit) can perform various tests and report the results. "lit" reads "test suite" files that instruct lit on how to perform the tests.

    The LLVM Object Reader (llvm-readobj) allows the programmer to view information about an object file. With this tool, developers can view sections, symbols, ELF dynamic symbol tables, required libraries, etc. This tools has features that differ from llvm-dis.

    LLVM has its own form of a Makefile called a "llvmbuild file". With the "llvm-build" tool, programmers can generate, edit, and read llvmbuild files. Also, Makefile and CMake fragment files can be produced with this same tool.

    Many other parameters and commands exist, but these are the main tools that most programmers should understand.

    Summary

    • IR files = LLVM Bitcode & LLVM Assembly
    • File-extensions
      • *.ll = LLVM Assembly
      • *.bc = LLVM Bitcode
      • *.a = Archive (library)
      • *.s = Native Assembly Source Code
      • *.c = C
      • *.cpp = C++
    • COMMAND: INPUT => OUTPUT
      • clang -S -emit-llvm in.c -o out.ll: *.c => *.ll
      • clang -emit-llvm in.c -o out.bc: *.c => *.bc
      • llc -march=cpp -o out.cpp in.ll: *.ll => *.cpp
      • llc -march=c -o out.c in.ll: *.ll => *.c
      • llvm-as: *.ll => *.bc
      • llvm-dis: *.bc => *.ll
      • opt --analyze: *.(ll|bc) => *.bc
      • opt -S --analyze: *.(ll|bc) => *.ll
      • llc: *.(ll|bc) => performance testing results
      • llc --filetype=asm: *.(ll|bc) => *.s
      • llc --filetype=obj: *.(ll|bc) => *.o
      • llvm-link: *.bc (many files) => *.bc (one file)
      • llvm-link: *.bc (many files) => *.ll (one file)
      • llvm-ar: *.bc => *.a

    Further Reading

Viewing 1 post (of 1 total)