Assembler Basics: Registers
When learning Assembler and Machine Language, processor registers are super important. Whenever you want to calculate something in Assembler, you first have to load the values into a register before you can perform the arithmetic calculation. Let's start with a hello world program in C:
hello.c:
#include <stdio.h>
int main()
{
printf("hello world");
}
Compile and link it with:
gcc hello.c
Have a look at the <main> function with the command:
objdump -d -Mintel a.out
Note that objdump -Mintel puts the source operand right and the target operand left while objdump -MIntel does it the other way round. Here is the <main> function:
1149: f3 0f 1e fa endbr64
114d: 55 push rbp
114e: 48 89 e5 mov rbp,rsp
1161: 48 8d 05 9c 0e 00 00 lea rax,[rip+0xe9c] # 2004 <_IO_stdin_used+0x4>
1168: 48 89 c7 mov rdi,rax
116b: b8 00 00 00 00 mov eax,0x0
1170: e8 db fe ff ff call 1050 <printf@plt>
1175: b8 00 00 00 00 mov eax,0x0
117a: c9 leave
117b: c3 ret
Here is an explanation what the commands do:
endbr64
Here for security purposes.
push rbp
Saves the Base Pointer Register to the stack. Can be restored later with the pop command.
mov rbp,rsp
Sets the register rbp (base pointer) to rsp (stack pointer). Base pointer = Stack pointer now. If you were to put some variables here, it would put them here on the stack.
48 8d 05 9c 0e 00 00 lea rax,[rip+0xe9c]
Loads the effective address of the instruction pointer plux 0xe9c into register AX. This is exactly where "hello world" is in the RAM.
mov rdi,rax
Moves register AX (RAM address of "hello world") into the destination index register RDI.
mov eax,0x0
Sets EAX to 0.
call 1050 <printf@plt>
Calls the printf@plt function that calls GLIBC's printf function. This can then inspect the registers and will know what to do (print) and where the output is. It will output the string till the first occurrence of the 0 byte.
leave
sets ESP=EBP and restores the base pointer from the stack using the pop command.
ret
returns to the calling function.
Debugging it
Now let's debug it, run one machine language command after the other. We use the GNU Debugger:
thorsten@tweedleburg:~$ gdb a.out
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(No debugging symbols found in a.out)
We want the executable to get loaded and get a virtual RAM space, but not finish, so we set a break point at the main routine and start running it:
(gdb) break main
Breakpoint 1 at 0x1151
(gdb) run
Starting program: /home/thorsten/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x0000555555555151 in main ()
Aha! We have finally been loaded into RAM and our memory addressses are around 0x0000555555555151. Let's see the commands that will be executed next by using the command disassemble:
(gdb) disassemble
Dump of assembler code for function main:
0x0000555555555149 <+0>: endbr64
0x000055555555514d <+4>: push %rbp
0x000055555555514e <+5>: mov %rsp,%rbp
=> 0x0000555555555151 <+8>: lea 0xeac(%rip),%rax # 0x555555556004
0x0000555555555158 <+15>: mov %rax,%rdi
0x000055555555515b <+18>: mov $0x0,%eax
0x0000555555555160 <+23>: call 0x555555555050 <printf@plt>
0x0000555555555165 <+28>: mov $0x0,%eax
0x000055555555516a <+33>: pop %rbp
0x000055555555516b <+34>: ret
End of assembler dump.
(gdb)
The arrow ( => ) shows that we are about to execute the lea command, load effective address.
See also
- https://stackoverflow.com/questions/71982459/why-does-printf-still-work-with-rax-lower-than-the-number-of-fp-args-in-xmm-regi
- https://cs.lmu.edu/~ray/notes/nasmtutorial/
- https://www.cs.virginia.edu/~evans/cs216/guides/x86.html
- https://stackoverflow.com/questions/17777146/what-is-the-purpose-of-cs-and-ip-registers-in-intel-8086-assembly
- https://stackoverflow.com/questions/4560720/why-does-the-stack-address-grow-towards-decreasing-memory-addresses
- https://www.linuxintro.org/wiki/gdb
Comments
Post a Comment