Display Assembly Code of C Program

In my last article I wrote about inline assembly for your C program. It got me thinking about the compilation process and how a compiler breaks the code down to the opcode (operation code) or assembly language and what that looks like. Anyone who has done a computer science/computer engineering degree has had to suffer through writing a program in assembly, myself, I was able to use the good old MC68HC11 to practice on. In fact, my professor for this class created his own assembly language with his own compiler and had us write our first assignment in BINARY, yes you read that correctly. It is actually somewhat neat to program in that low a level and can be very rewarding when your program works correctly, but I digress.

There are a few ways to investigate how your program has been boiled down to assembly. You may use the objdump utility in tandem with gcc -g, or you can compile your program with -s which will produce a *.s file. Lets first look at compiling a program with the gcc -s option.

Compile Your Program With ‘gcc -s’

To produce a .s file simply add the -s flag to your compilation step. This informs the compiler to stop after compilation, quoted from the gcc manual page:

-S
 
Stop after the stage of compilation proper; do not assemble. The output is in
the form of an assembler code file for each non-assembler input file specified.
 
By default, the assembler file name for a source file is made by replacing the 
suffix .c, .i, etc., with .s.
 
Input files that don't require compilation are ignored.

Hey sweet! Lets check out the process and the file.

$ gcc -s -c zeus.c
$ ls -l
total 40
-rw-r--r-- 1 erik erik   335 2011-04-09 14:06 Makefile
-rwxr-xr-x 1 erik erik  9357 2011-04-09 14:06 zeus
-rw-r--r-- 1 erik erik   554 2008-11-13 21:06 zeus.c
-rw-r--r-- 1 erik erik  1079 2008-09-21 19:08 zeus.c~
-rw-r--r-- 1 erik erik  1200 2011-04-09 14:20 zeus.o
-rw-r--r-- 1 erik erik 11345 2011-04-09 14:06 zeus.s

Well look here, zeus.s! I wonder what could be in that file? Any bets? You guessed it, some cryptic information.

.LCFI5:
        .loc 1 24 0
        movl    stderr, %eax
        movl    $4, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    %eax, (%esp)
        call    fprintf
        movl    stderr, %eax
        movl    $24, 16(%esp)
        movl    $.LC1, 12(%esp)
        movl    $__FUNCTION__.3213, 8(%esp)
        movl    $.LC2, 4(%esp)
        movl    %eax, (%esp)
        call    fprintf
        movl    $.LC3, (%esp)
        call    perror
        movl    $1, (%esp)
        call    exit

While this information can be useful, it is somewhat hard to interpret. The assembly code above makes a little more sense, and shows use of the stack pointer (%esp) and the ‘a’ register (%eax). This should look familiar if you read my previous article. The objdump program makes this cryptic information a little easier to use.

View Assembly With ‘objdump’

To fully utilize the objdump utility compile your C program using the -g option. From the GCC manual page:

-g
 
Produce debugging information in the operating system's native format
(stabs, COFF , XCOFF , or DWARF 2). GDB can work with this debugging
information. On most systems that use stabs format, -g enables use of 
extra debugging information that only GDB can use; this extra information 
makes debugging work better in GDB but will probably make other debuggers
crash or refuse to read the program. If you want to control for certain
whether to generate the extra information, use -gstabs+, -gstabs, 
-gxcoff+, -gxcoff, or -gvms (see below).
 
GCC allows you to use -g with -O. The shortcuts taken by optimized code 
may occasionally produce surprising results: some variables you declared 
may not exist at all; flow of control may briefly move where you did not expect
it; some statements may not be executed because they compute constant results
or their values were already at hand; some statements may execute in different 
places because they were moved out of loops.
 
Nevertheless it proves possible to debug optimized output. This makes it 
reasonable to use the optimizer for programs that might have bugs.

Compiling with the -g option just includes debugging information for objdump to interpret. There are a few basic commands that can be used, I haven’t delved too deeply into it, but the few main ones I use are -a, -S, -t.

Use ‘objdump -t’ To Display Symbols

The symbol table is a data structure used by the compiler, where each identifier in the program’s source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location. Using objdump -t on our binary we can view the symbol table.

$ objdump -t get_test
 
get_test:     file format elf32-i386
 
SYMBOL TABLE:
08048114 l    d  .interp        00000000              .interp
08048128 l    d  .note.ABI-tag  00000000              .note.ABI-tag
08048148 l    d  .hash  00000000              .hash
08048178 l    d  .dynsym        00000000              .dynsym
080481e8 l    d  .dynstr        00000000              .dynstr
08048244 l    d  .gnu.version   00000000              .gnu.version
08048254 l    d  .gnu.version_r 00000000              .gnu.version_r
08048274 l    d  .rel.dyn       00000000              .rel.dyn
08048284 l    d  .rel.plt       00000000              .rel.plt
...

The first column represents where in memory these exist, the second column is the symbol type, and third is the type of symbol. While this is useful, I find the -S flag to be the most useful. This intermixes source code with disassembly.

Display Code Intermixed With Assembly Using ‘objdump -S’

This is by far my favourite output from objdump. For instance:

$ objdump -S get_test
int main()
{
 80483b4:       8d 4c 24 04             lea    0x4(%esp),%ecx
 80483b8:       83 e4 f0                and    $0xfffffff0,%esp
 80483bb:       ff 71 fc                pushl  0xfffffffc(%ecx)
 80483be:       55                      push   %ebp
 80483bf:       89 e5                   mov    %esp,%ebp
 80483c1:       51                      push   %ecx
 80483c2:       83 ec 24                sub    $0x24,%esp
        char c;
        for(;;){
                c=getchar();
 80483c5:       e8 02 ff ff ff          call   80482cc <getchar@plt>
 80483ca:       88 45 fb                mov    %al,0xfffffffb(%ebp)
                if(c == 'q')

As you can see in this example there is C code intermixed with the assembly language used behind the scenes. I think this output is the most useful because it allows you to compare your C code with the assembly the compiler has produced. Hopefully this has given you a few examples of the ways to retrieve your assembly code for your program.