In my last article I wrote about inline assembly for your C program. It got me thinking about the compilation process and how a compiler breaks the code down to the opcode (operation code) or assembly language and what that looks like. Anyone who has done a computer science/computer engineering degree has had to suffer through writing a program in assembly, myself, I was able to use the good old MC68HC11 to practice on. In fact, my professor for this class created his own assembly language with his own compiler and had us write our first assignment in BINARY, yes you read that correctly. It is actually somewhat neat to program in that low a level and can be very rewarding when your program works correctly, but I digress.
There are a few ways to investigate how your program has been boiled down to assembly. You may use the objdump utility in tandem with gcc -g, or you can compile your program with -s which will produce a *.s file. Lets first look at compiling a program with the gcc -s option.
Compile Your Program With ‘gcc -s’
To produce a .s file simply add the -s flag to your compilation step. This informs the compiler to stop after compilation, quoted from the gcc manual page:
-S
Stop after the stage of compilation proper; do not assemble. The output is in
the form of an assembler code file for each non-assembler input file specified.
By default, the assembler file name for a source file is made by replacing the
suffix .c, .i, etc., with .s.
Input files that don't require compilation are ignored.
Hey sweet! Lets check out the process and the file.
$ gcc -s -c zeus.c
$ ls -l
total 40
-rw-r--r-- 1 erik erik 335 2011-04-09 14:06 Makefile
-rwxr-xr-x 1 erik erik 9357 2011-04-09 14:06 zeus
-rw-r--r-- 1 erik erik 554 2008-11-13 21:06 zeus.c
-rw-r--r-- 1 erik erik 1079 2008-09-21 19:08 zeus.c~
-rw-r--r-- 1 erik erik 1200 2011-04-09 14:20 zeus.o
-rw-r--r-- 1 erik erik 11345 2011-04-09 14:06 zeus.s
Well look here, zeus.s! I wonder what could be in that file? Any bets? You guessed it, some cryptic information.
.LCFI5:
.loc 1 24 0
movl stderr, %eax
movl $4, 8(%esp)
movl $.LC0, 4(%esp)
movl %eax, (%esp)
call fprintf
movl stderr, %eax
movl $24, 16(%esp)
movl $.LC1, 12(%esp)
movl $__FUNCTION__.3213, 8(%esp)
movl $.LC2, 4(%esp)
movl %eax, (%esp)
call fprintf
movl $.LC3, (%esp)
call perror
movl $1, (%esp)
call exit
While this information can be useful, it is somewhat hard to interpret. The assembly code above makes a little more sense, and shows use of the stack pointer (%esp) and the ‘a’ register (%eax). This should look familiar if you read my previous article. The objdump program makes this cryptic information a little easier to use.
View Assembly With ‘objdump’
To fully utilize the objdump utility compile your C program using the -g option. From the GCC manual page:
-g
Produce debugging information in the operating system's native format
(stabs, COFF , XCOFF , or DWARF 2). GDB can work with this debugging
information. On most systems that use stabs format, -g enables use of
extra debugging information that only GDB can use; this extra information
makes debugging work better in GDB but will probably make other debuggers
crash or refuse to read the program. If you want to control for certain
whether to generate the extra information, use -gstabs+, -gstabs,
-gxcoff+, -gxcoff, or -gvms (see below).
GCC allows you to use -g with -O. The shortcuts taken by optimized code
may occasionally produce surprising results: some variables you declared
may not exist at all; flow of control may briefly move where you did not expect
it; some statements may not be executed because they compute constant results
or their values were already at hand; some statements may execute in different
places because they were moved out of loops.
Nevertheless it proves possible to debug optimized output. This makes it
reasonable to use the optimizer for programs that might have bugs.
Compiling with the -g option just includes debugging information for objdump to interpret. There are a few basic commands that can be used, I haven’t delved too deeply into it, but the few main ones I use are -a, -S, -t.
Use ‘objdump -t’ To Display Symbols
The symbol table is a data structure used by the compiler, where each identifier in the program’s source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location. Using objdump -t on our binary we can view the symbol table.
$ objdump -t get_test
get_test: file format elf32-i386
SYMBOL TABLE:
08048114 l d .interp 00000000 .interp
08048128 l d .note.ABI-tag 00000000 .note.ABI-tag
08048148 l d .hash 00000000 .hash
08048178 l d .dynsym 00000000 .dynsym
080481e8 l d .dynstr 00000000 .dynstr
08048244 l d .gnu.version 00000000 .gnu.version
08048254 l d .gnu.version_r 00000000 .gnu.version_r
08048274 l d .rel.dyn 00000000 .rel.dyn
08048284 l d .rel.plt 00000000 .rel.plt
...
The first column represents where in memory these exist, the second column is the symbol type, and third is the type of symbol. While this is useful, I find the -S flag to be the most useful. This intermixes source code with disassembly.
Display Code Intermixed With Assembly Using ‘objdump -S’
This is by far my favourite output from objdump. For instance:
$ objdump -S get_test
int main()
{
80483b4: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483b8: 83 e4 f0 and $0xfffffff0,%esp
80483bb: ff 71 fc pushl 0xfffffffc(%ecx)
80483be: 55 push %ebp
80483bf: 89 e5 mov %esp,%ebp
80483c1: 51 push %ecx
80483c2: 83 ec 24 sub $0x24,%esp
char c;
for(;;){
c=getchar();
80483c5: e8 02 ff ff ff call 80482cc <getchar@plt>
80483ca: 88 45 fb mov %al,0xfffffffb(%ebp)
if(c == 'q')
As you can see in this example there is C code intermixed with the assembly language used behind the scenes. I think this output is the most useful because it allows you to compare your C code with the assembly the compiler has produced. Hopefully this has given you a few examples of the ways to retrieve your assembly code for your program.