GNU-Binutils

This topic was published by and viewed 905 times since "". The last page revision was "".

Viewing 1 post (of 1 total)
  • Author
    Posts
  • DevynCJohnson
    DevynCJohnson
    Keymaster
    • Topics - 444
    • @devyncjohnson

    The GNU Binutils are a part of the GNU Toolchain and commonly used with the GNU Compiler Collection (GCC). However, the GNU Binutils alone contain over a dozen tools that programmers can use. Since GNU is such an important component of Linux, it may help Linux users to know about this package. If you wish to know all of the parameters of the commands seen below, then view the official documentation (links near the bottom) or read the manual pages in a terminal (man COMMAND). Learning about these commands can help readers understand the GNU Toolchain and Binutils. Also, this information will help programmers know about some other tools they can use.

    as

    The GNU Assembler is the tool used to assemble low-level code (typically Assembly language) into machine language (the binary digits understood by the CPU). GNU Assembler is the usual backend for the GNU Compiler Collection. This assembler supports a variety of architecture types. GNU Assembler uses "assembler directives", which are also called "psuedo ops". Directives are special pieces of code that inform an assembler, interpreter, compiler, or preprocessor how to manage the code of the program. A "pseudo op" (op = operation) is just another name for a directive that is read by an assembler.

    This assembler can be used in the command-line by using the "as" command. Use parameters to specify the target architecture (what type of CPU will the assembled code run on?) or special actions/behaviors for the assembler. For instance, "as -PARAMETERS FILE" is a valid format to use in the command-line. As long as no errors occur, the output should be an object file (*.o). Alternately, users can send parameters to the assembler when using the "gcc" command (the compiler). In this example "gcc -c -g -O -Wa,-alh,-L file.c", the "-Wa" indicates that some commands need to be sent to the assembler. Next, "-alh,-L" are the parameters, so "-Wa,-alh,-L" is the set of flags for the assembler. Remember, start with "-Wa" and then use commas (no spaces) to separate each parameter meant for the assembler.

    QUICK FACT: A macros is a form of directive.

    ld

    The GNU Linker is the tool used to connect (or link) object files, archive files (*.a), and/or libraries together to yield an executable or library. This is the last step in the compilation process. "ld" links the object files generated by "as", or some other assembler.

    To use "ld" in a command-line, use a format like this "ld -PARAMETERS OBJ_FILES". As for a more specific example, "ld -o ./MyApp /lib/MYLIB.o THE_CORE.o -lmath" will link "/lib/MYLIB.o" (a needed library) and "THE_CORE.o" (the program that was written). The "-lmath" means a library named "libmath" with a file extension of "*.a" or "*.so" will be found and linked in the process. The "-l" indicates that the linker will search for the file. If the format "-l:CODE.X" is used, then a file with the exact name "CODE.X" will be found and added/linked. Back to the first specific example (ld -o ./MyApp /lib/MYLIB.o THE_CORE.o -lmath), the "-o ./MyApp" indicates the output file will be named "MyApp" and be placed in the current directory. Remember, "*.a" are archive files and "*.so" are shared libraries.

    NOTE: ld, gold, and many other Binutils use BFD (Binary File Descriptor library) to provide the abilities to support many binary/executable formats and perform the low-level tasks. The opcodes library is used to provide the ability to assemble/disassemble machine instructions.

    gold

    Gold is another GNU linker, but this one produces ELF files. ELF stands for "Executable and Linkable Format" and is a specific format for executable files. The header of ELF files begins as "0x7f0x045x04cx46" which is hexadecimal ASCII code. This header is a "magic number" meaning that the file and format can be identified by this header. Any program beginning with that magic number is an ELF file. ELF files have a header that provides various information about the file. A 32-bit compatible ELF file has a 32-bit header, and a 64-bit compatible file has a 64-bit header. Using "gold" in the command-line is exactly like using ld. However, there are different ways of utilizing gold depending on your system. First of all, you must install gold because not all systems have it. After installation, for some users, just replace "ld" with "ld.gold". For other users, installing gold replaces "ld" with gold named as "ld". If you want to keep the real "ld", then install gold in another place (like /opt/gold). Next, use gold through gcc like this "gcc -B/opt/gold PARAMETERS AND FILES". This will make gcc use gold instead of ld when it is time to link the object files. In general, it is best to stick with ld because gold can cause some problems and it does not search for libraries as thoroughly as ld. If you think you need to use gold, be sure to research the advantages and disadvantages.

    NOTE: This may help to quickly explain assembly and disassembly. *Assemble: code => machine instructions. *Disassemble: machine instructions => code.

    gprof

    The GNU Profiler is used to analyze a programs performance. gprof does not work with all programs. Programmers must compile the code with the "-pg" flag in gcc to make the program support gprof. Only use that flag when testing your program. Final compilations (or stable releases) should not use that flag, thus reducing the executable's size. When a program with this feature exits after being executed, either a "gmon.out" or a "PROGRAM-NAME.gmon" file is made in the program's current working directory. In a command-line, type "gprof /path/to/gmon.out" to have gprof analyze the data. This allows developers to see where their program could use a performance boost. Some ways to improve a program's performance includes enabling multithreading, changing the algorithm, using different compilation flags, and many other methods.

    addr2line

    This command can be used to use an address to find a line in the program. This is useful when a program crashes and an error message displays the address containing the cause of the error. For example, if you have an Android program that crashed in the SDK, and you have traced the error to a specific address of a specific library, then you can easily find the error. In a command-line on a system that has the Android SDK installed, type "arm-linux-androideabi-addr2line -C -f -e obj/local/armeabi/libmath.so ADDRESS". This uses an Android/ARM compatible addr2line program. The "-C" demangles C++ code, "-f" lists the function that owns the address, and "-e" indicates the file's name and location/path (in this case, a math library). The address is listed last; use "-a" to list multiple addresses. As an example, the output may inform the programmer that the address points to the 32nd line which is under the cosine function. addr2line accepts the same parameters and uses the same syntax despite whether it is called using "i686-w64-mingw32-addr2line", "arm-linux-androideabi-addr2line", or some other tool. Just be sure to match the correct tool with the matching executable/library. Sometimes, addr2line may not output data because of incompatibilities with the executable. If such problems arise, try using eu-addr2line or gdb.

    ar

    Archiver was once used as a file archive before tar became popular. However, archiver still has a use; it creates and updates static libraries. It can also group libraries together to make it easier to link libraries or keep groups of libraries together. For illustration, assume a special set of physics classes depended on each other. Normally, when one library is updated, the other also get some kind of bug fix. Obviously, these libraries must be kept together to prevent issues. As a solution, the programmer types the following code in a command-line - "ar rcs libphysics.a libphys1.o libphys2.o libphys3.o". This combines the three physics libraries (libphys*.o) into one named "libphysics.a". Remember, this is not linking; the libraries where just archived together like a tar file. Now, to compile or link the libraries with the main program, the developer would use "gcc -PARAMS science.c libphysics.a" or "ld -PARAMS science.c -l:libphysics.a", respectively. This also makes it easier to type the commands. Otherwise, each library would need to be typed one at a time.

    c++filt

    This utility demangles the low-level names in compiled C++ and Java code and converts the low-level name to a user-friendly name. To better understand this, it helps to know that mangling is the opposite of demangling. For instance, a programmer may have a namespace named "tts" and under it is a function named "process" in the class called "english" (tts::english::process). If the programmer uses g++ v3.x to compile the code, the compiler will change the names (called mangling) to avoid various issues. So, the "process" function is mangled as "_ZN3tts7english7processE" (this is a real format). "_Z" is the start of a mangled symbol (name of a function, class, etc.), and "N" represents nested names/symbols. Next, is a number indicating how many characters make up the name ("tts" is three letters) and then the name itself. The end of the mangled symbol is "E". So, the unmangled symbol "tts::english::process" becomes "_ZN3tts7english7processE" after mangling. Demangling is the reverse process. Every compiler has a different way of mangling the symbols. The method may change between a new and old version of the same compiler. To use c++filt in a command-line, type "c++filt -PARAM MANGLED-SYMBOL". For Java symbols, use the "-j" parameter.

    dlltool

    The dlltool is used when code needs to be compiled as a DLL. DLL files are used by Windows systems, and the files are in the PE format. The dlltool needs to produce an "export table" that will be placed in the generated DLL. Also, the dlltool must create a library file that programs communicate with to get the functions from the DLL.

    The dlltool can read either a "*.def" file, "*.o" file, or "*.a" file (the "*.a" or "*.o" file is planned to be the DLL) to generate the needed export table and the library interface for the DLL. To use "dlltool", see the example below. Notice that the name of the library and DLL file are the same except for the file-extension. Try to use this standard.

    gcc -c mydll.c
    # compile source code. mydll.c => mydll.o
    dlltool -e myexports.o -l mydll.lib mydll.o
    # generate an export file (-e myexports.o) and a library (-l mydll.lib)
    gcc mydll.o myexports.o -o mydll.dll
    # make the dll
    gcc myprogram.o mydll.lib -o program
    # add library to a program

    nlmconv

    This tool converts an object file to a NetWare Loadable Module, which is a binary file with the *.nlm file extension. These files could act as drivers, applications, or libraries. This command can be used in a command-line with a format like this "nlmconv INPUT-FILE OUTPUT-FILE". NetWare loadable modules only work on the NetWare operating system on an i386 architecture.

    nm

    The Name Mangling tool helps find symbol conflicts by allowing the programmer to view a list of symbols from the binary file. To use "nm", type "nm FILE" in a command-line, where "FILE" is the compiled file.

    objcopy

    This command copies an object file's contents and pastes it to another object file. You may be wondering why the user does not copy the file in a file manager. Well, this tool does not just copy the contents. Users can use various parameters to convert the contents to another format. The basic format for usage in the command-line is "objcopy -PARAMS INFILE OUTFILE".

    objdump

    This is a disassembler that uses the BFD library. With this, programmers can view the assembly code of the program. This tool can display a variety of information about the binary file. Type "objdump -x BINARY" to see the file's contents. It may be best to pipe the output to "less" so the user can scroll through the output.

    ranlib

    The ranlib tool can be used to generate an index of symbols within an archive. This index is then added to the archive. Doing so speeds up the linking processes. Type "ranlib ARCHIVE" in a command-line to make the index.

    readelf

    Readelf is just like objdump, except it only reads ELF files and does not use the BFD library.

    size

    This utility displays the size of each section of the binary file like the amount of text, hex codes, etc.

    strings

    This utility is used to display printable strings found in the binary data. The basic usage is to list a file like this - "string FILE". With Root privileges, users can safely read the stings in from the code in the BIOS by running the command dd if=/dev/mem bs=1k skip=768 count=256 2>/dev/null | strings -n 8 | less.

    strip

    This command removes unneeded symbols from a binary. This will likely increase speed and performance at the cost removing nearly all debugging features. In other words, the binary cannot be debugged. Using the "-s" flag with the compiler achieves the same results. Usage: "string BINARY"

    windmc

    When given an "*.mc" file, Windows message resources can be generated. The "*.mc" file contains message definitions.

    windres

    Windows resources can be modified with this tool. This tool can convert rc to res and res to coff. rc, res, and coff are different binary formats. This command has many other features.

    If you want to learn more about each command's parameter, then visit https://sourceware.org/binutils/docs-2.24/binutils/index.html or https://sourceware.org/binutils/ or http://manned.org/browse/ubuntu-trusty/binutils/2.24-5ubuntu3 .

    To understand the difference between Assembly and machine code, see http://stackoverflow.com/questions/1253272/whats-the-relationship-between-assembly-language-and-machine-language

    Further Reading

Viewing 1 post (of 1 total)