Stack-smashing-debugging-guide
This is a step-by-step guide to debug stack smashing violations.
Symptoms
The stack corruption always looks the same:
user $
some-command
... *** stack smashing detected ***: terminated
Tl;DR:
- Enable debugging symbols in your make.conf: CODE /etc/portage/make.conf
<syntaxhighlight lang="bash">CFLAGS="... -ggdb" CXXFLAGS="... -ggdb" FEATURES="... splitdebug"</syntaxhighlight>
- Disable positional-independent executable to make addresses reproducible and rebuild problematic package:
user $
LDFLAGS=-no-pie emerge -v1 foo-package
- Enable core dump generation with and spot the function where stack is corrupted.
user $
ulimit -c unlimited
- Find where stack canary is stored on stack.
- Add gdb watch point and find out where canary override happens.
Practical example
To get some hands-on experience let's explore the runnable toy example:
<syntaxhighlight lang="c">#include <stdio.h> // $ gcc a.c -o a // $ ./a 1 2 3 4 5 6 7 8 // *** stack smashing detected ***: terminated int main(int argc, char * argv[]) { volatile long v[8]; v[argc] = 42; printf("Hello! Is my stack OK?\n"); return v[argc+1]; }</syntaxhighlight>
user $
gcc a.c -o a
user $
./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK? *** stack smashing detected ***: terminated
To make addresses stable across invocations let's disable PIE. While at it let's also enable debugging info:
user $
gcc a.c -o a -no-pie -ggdb3
user $
./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK? *** stack smashing detected ***: terminated
Now let's enable core dumps to peek at approximate location of stack crash:
user $
ulimit -c unlimited
user $
./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK? *** stack smashing detected ***: terminated Aborted (core dumped)
Still no change in the output. Bug did not disappear. Good!
Now let's get a backtrace to see which function failure happened in:
user $
gdb --quiet ./a core.1117780
Reading symbols from ./a... [New LWP 1117780] Core was generated by `./a 1 2 3 4 5 6 7 8'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 return ret; (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fcbdeed6537 in __GI_abort () at abort.c:79 #2 0x00007fcbdef2f1d9 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fcbdf036c2f "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155 #3 0x00007fcbdefbd0a2 in __GI___fortify_fail (msg=msg@entry=0x7fcbdf036c17 "stack smashing detected") at fortify_fail.c:26 #4 0x00007fcbdefbd080 in __stack_chk_fail () at stack_chk_fail.c:24 #5 0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13
The interesting part here is the caller of __stack_chk_fail. In our case it's main.
Now the hardest part: we need to find in assembly where canary value was stored and loaded on stack. Canary is a value placed on stack and checked by the compiler to see if anything corrupts canary value. Don't panic. You don't really need to know much of assembly to find the canary. Canary is emitted by the compiler if {{{1}}} option is enabled. Gentoo builds {{c|gcc} with --enable-default-ssp configure option which enables this option by default.
user $
gdb --quiet ./a core.1117780
(gdb) frame 5 #5 0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13 13 } (gdb) disassemble Dump of assembler code for function main: 0x0000000000401136 <+0>: push %rbp 0x0000000000401137 <+1>: mov %rsp,%rbp 0x000000000040113a <+4>: sub $0x60,%rsp 0x000000000040113e <+8>: mov %edi,-0x54(%rbp) 0x0000000000401141 <+11>: mov %rsi,-0x60(%rbp) 0x0000000000401145 <+15>: mov %fs:0x28,%rax 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) 0x0000000000401152 <+28>: xor %eax,%eax 0x0000000000401154 <+30>: mov -0x54(%rbp),%eax 0x0000000000401157 <+33>: cltq 0x0000000000401159 <+35>: movq $0x2a,-0x50(%rbp,%rax,8) 0x0000000000401162 <+44>: lea 0xe9b(%rip),%rdi # 0x402004 0x0000000000401169 <+51>: callq 0x401030 <puts@plt> 0x000000000040116e <+56>: mov -0x54(%rbp),%eax 0x0000000000401171 <+59>: add $0x1,%eax 0x0000000000401174 <+62>: cltq 0x0000000000401176 <+64>: mov -0x50(%rbp,%rax,8),%rax 0x000000000040117b <+69>: mov -0x8(%rbp),%rdx 0x000000000040117f <+73>: sub %fs:0x28,%rdx 0x0000000000401188 <+82>: je 0x40118f <main+89> 0x000000000040118a <+84>: callq 0x401040 <__stack_chk_fail@plt> => 0x000000000040118f <+89>: leaveq 0x0000000000401190 <+90>: retq End of assembler dump.
On amd64 magic value is %fs:0x28. We need to track where it's stored on stack. It's always very close to %fs:0x28 itself. In our case it is a sequence of 3 instructions:
<syntaxhighlight lang="asm">0x0000000000401136 <+0>: push %rbp 0x0000000000401137 <+1>: mov %rsp,%rbp 0x000000000040113a <+4>: sub $0x60,%rsp 0x000000000040113e <+8>: mov %edi,-0x54(%rbp) 0x0000000000401141 <+11>: mov %rsi,-0x60(%rbp) 0x0000000000401145 <+15>: mov %fs:0x28,%rax ; read value from TLS: rax = %fs:0x28 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) ; store canary on stack: [%rbp - 8] = rax 0x0000000000401152 <+28>: xor %eax,%eax ; erase canary from registers: rax = 0 ... 0x000000000040117b <+69>: mov -0x8(%rbp),%rdx ; load value from stack 0x000000000040117f <+73>: sub %fs:0x28,%rdx ; compare value with TLS value 0x0000000000401188 <+82>: je 0x40118f <main+89> ; fail if values don't match 0x000000000040118a <+84>: callq 0x401040 <__stack_chk_fail@plt> => 0x000000000040118f <+89>: leaveq 0x0000000000401190 <+90>: retq</syntaxhighlight>
Here the important bit is exact instruction where canary is stored on stack: 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) and erased from registers: 0x0000000000401152 <+28>: xor %eax,%eax. Our task is to get right past store instruction, set canary watch and wait when it gets corrupted.
Here is the full session from start to finish to find our corruption:
user $
gdb --quiet --args ./a 1 2 3 4 5 6 7 8
Reading symbols from ./a... (gdb) start Temporary breakpoint 1 at 0x401145: file a.c, line 6. Starting program: /tmp/a 1 2 3 4 5 6 7 8 Temporary breakpoint 1, main (argc=9, argv=0x7fffffffd7f8) at a.c:6 6 int main(int argc, char * argv[]) { (gdb) break *0x0000000000401152 Breakpoint 2 at 0x401152: file a.c, line 6. (gdb) continue Continuing. Breakpoint 2, 0x0000000000401152 in main (argc=9, argv=0x7fffffffd7f8) at a.c:6 6 int main(int argc, char * argv[]) { (gdb) watch *(long*)($rbp-8) Watchpoint 3: *(long*)($rbp-8) (gdb) continue Continuing. Watchpoint 3: *(long*)($rbp-8) Old value = -6583947134921550848 New value = 42 main (argc=9, argv=0x7fffffffd7f8) at a.c:10 10 printf("Hello! Is my stack OK?\n"); (gdb) list 5 // *** stack smashing detected ***: terminated 6 int main(int argc, char * argv[]) { 7 volatile long v[8]; 8 v[argc] = 42; 9 10 printf("Hello! Is my stack OK?\n"); 11 12 return v[argc+1]; 13 } (gdb) disassemble /s Dump of assembler code for function main: a.c: 6 int main(int argc, char * argv[]) { 0x0000000000401136 <+0>: push %rbp 0x0000000000401137 <+1>: mov %rsp,%rbp 0x000000000040113a <+4>: sub $0x60,%rsp 0x000000000040113e <+8>: mov %edi,-0x54(%rbp) 0x0000000000401141 <+11>: mov %rsi,-0x60(%rbp) 0x0000000000401145 <+15>: mov %fs:0x28,%rax 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) 0x0000000000401152 <+28>: xor %eax,%eax 7 volatile long v[8]; 8 v[argc] = 42; 0x0000000000401154 <+30>: mov -0x54(%rbp),%eax 0x0000000000401157 <+33>: cltq 0x0000000000401159 <+35>: movq $0x2a,-0x50(%rbp,%rax,8) 9 10 printf("Hello! Is my stack OK?\n"); => 0x0000000000401162 <+44>: lea 0xe9b(%rip),%rdi # 0x402004 0x0000000000401169 <+51>: callq 0x401030 <puts@plt> 11 12 return v[argc+1]; 0x000000000040116e <+56>: mov -0x54(%rbp),%eax 0x0000000000401171 <+59>: add $0x1,%eax 0x0000000000401174 <+62>: cltq 0x0000000000401176 <+64>: mov -0x50(%rbp,%rax,8),%rax 13 } 0x000000000040117b <+69>: mov -0x8(%rbp),%rdx 0x000000000040117f <+73>: sub %fs:0x28,%rdx 0x0000000000401188 <+82>: je 0x40118f <main+89> 0x000000000040118a <+84>: callq 0x401040 <__stack_chk_fail@plt> 0x000000000040118f <+89>: leaveq 0x0000000000401190 <+90>: retq End of assembler dump.
Sequence of used gdb commands explained:
- start: start the program and pause at the beginning of main function.
- break: set next stop at instruction after canary store
- continue: resume program until it breaks again. It should break at our 0x0000000000401152 address
- watch: watch memory changes at specified address
- continue: wait when watch triggers for write
- disassemble /s: get the disassembly interspersed with source code.
The instruction preceding our current instruction (marked as =>) is our offender:
<syntaxhighlight lang="asm">v[argc] = 42; 0x0000000000401159 <+35>: movq $0x2a,-0x50(%rbp,%rax,8) => 0x0000000000401162 <+44>: lea 0xe9b(%rip),%rdi # 0x402004</syntaxhighlight>
Thus v[argc] = 42; is our problematic source line (0x2a is 42).
Now you can add more debugging to understand the nature of the overflow.
Links
- ARM64 debugging example: https://bugs.gentoo.org/721570#c7
- MIPS debugging example: https://trofi.github.io/posts/205-stack-protection-on-mips64.html