Starting with the binary
When you open an unknown binary in a disassembler, it is tempting to start reading instructions immediately. Start with the metadata first.
# Binary format
file target_binary
# Linked libraries
ldd target_binary
# Embedded strings
strings target_binary | grep -i "error\|fail\|password\|key\|flag"
# Exported symbols
nm -D target_binary
# Section headers
readelf -S target_binaryStrings alone can tell you a lot. Error messages reveal what the program does. Library function names tell you what APIs it uses. Format strings show you the shape of internal data.
Recognizing function prologues
Every function starts with a prologue that sets up the stack frame:
// Standard x86-64 prologue
push rbp
mov rbp, rsp
sub rsp, 0x30 // allocate 48 bytes of local variables
// Leaf function (no calls to other functions)
sub rsp, 0x18 // stack adjustment, no frame pointer
// With canary (stack protection)
push rbp
mov rbp, rsp
sub rsp, 0x30
mov rax, fs:[0x28] // load stack canary
mov [rbp-0x8], rax // store on stackThe stack canary pattern (fs:[0x28]) strongly suggests the binary was compiled with -fstack-protector. The canary value is checked before the function returns. If it was overwritten by a buffer overflow, the program aborts.
Calling conventions
On x86-64 Linux (System V ABI), function arguments go in registers:
| Argument | Integer/Pointer | Float |
|---|---|---|
| 1st | RDI | XMM0 |
| 2nd | RSI | XMM1 |
| 3rd | RDX | XMM2 |
| 4th | RCX | XMM3 |
| 5th | R8 | XMM4 |
| 6th | R9 | XMM5 |
| 7th+ | Stack | Stack |
| Return | RAX | XMM0 |
So when you see:
mov edi, 0x10 // first arg: 16
call malloc // malloc(16)
mov [rbp-0x18], rax // store returned pointer in local variableThis allocates 16 bytes and stores the returned pointer.
Common patterns
If-else:
cmp eax, 5
jne .else_branch
// then-block code
jmp .after_if
.else_branch:
// else-block code
.after_if:For loop:
mov ecx, 0 // i = 0
.loop_start:
cmp ecx, 100 // i < 100
jge .loop_end
// loop body
inc ecx // i++
jmp .loop_start
.loop_end:Switch statement (jump table):
cmp eax, 7 // check bounds
ja .default_case
lea rdx, [rip + .jump_table]
movsxd rax, [rdx + rax*4]
add rax, rdx
jmp raxVirtual function call (C++ vtable):
mov rax, [rdi] // load vtable pointer from object
call [rax + 0x18] // call 4th virtual function (index 3)Working with stripped binaries
Most release binaries are stripped, meaning all symbol names are removed. You can't see function names, only addresses. IDA Pro and Ghidra will auto-detect function boundaries and let you rename them as you figure out what they do.
The approach:
- Find
mainby looking at the entry point._startcalls__libc_start_mainwithmainas an argument - Look for strings referenced by functions to guess their purpose
- Look for library calls (
printf,malloc,open,socket) to understand the high-level behavior - Name functions as you understand them:
check_password,parse_config,send_response
Dealing with optimization
Compiler optimizations can make disassembly harder to read. Common transformations:
- Strength reduction: multiplication by constants becomes shifts and adds (
x * 5becomeslea eax, [rax + rax*4]) - Inlining: small functions are copied into callers, so you won't find them as separate functions
- Loop unrolling: the loop body is duplicated 2-4x to reduce branch overhead
- Tail call optimization: the last call in a function becomes a
jmp, so the function does not appear on the stack
At -O0, the disassembly closely matches the source code. At -O2 or -O3, the compiler rearranges code for performance and the mapping to source becomes less obvious. Start with -O0 builds when learning, then work up to optimized binaries.
Tools of the trade
- Ghidra: Free, open source, excellent decompiler. Start here.
- IDA Pro: Industry standard, expensive, best database format for collaborative RE (my preferred tool)
- Binary Ninja: Good middle ground, nice API for scripting analysis
- radare2/rizin: Command line, free, steep learning curve but powerful
- GDB/LLDB: Dynamic analysis, breakpoints, memory inspection at runtime
- strace/ltrace: Trace system calls and library calls without disassembling anything
- x64/32 Dbg: If IDA pro isn't available to you, i'd use this.
