我作为软件工程师在美国工作了三年,期间接触和浸泡过不少的技术栈,罗列一番的话有 AWS cloud, frontend (React, TypeScript), backend (Django, Springboot, ECS, GraphQL), JVM (Java, Scala, Kotlin), data pipelines (Spark, Snowflake, Prefect, DBT), database (Clickhouse, MySQL, Postgres, DuckDB), devOps (Gitlab CI, Jenkins, Bazel, linter), Observability (Grafana, Splunk, Prometheus), Docker, Kubernetes等一大串内容,做做准备的话可以把自己营造成多面手的人设。但是我明白,绝大多数时候里我都只是在使用各类application罢了,虽然自诩代码品味不错,追求让人一目了然而简洁易懂的程序,但总归还是在浅水里扑腾。
趁着辞职后的充电期,我开始系统性地学习计算机的各类基础内容。虽然缺乏这些知识并没有怎么影响我的日常工作,但我仍然发自内心地对计算机这个事物感到好奇,希望能够对它有更多的了解。我决定写些文章来记录分享我在阅读大名鼎鼎的教材 Computer Systems: A Programmer’s Perspective 期间的学习感悟,并尽可能地用高质量的例子来巩固所学。
这篇文章会用程序员们喜闻乐见的翻转链表作为例子,介绍寄存器register, 汇编assembly, 以及栈帧stack frame等内容,希望能够给大家带来一种「串联起来了」的愉悦。由于我比较偏好阅读英文的学习材料(我没有读过计算机领域的中文教材),对于许多技术名词的中文翻译感到别扭(比如把pipeline叫做流水线,stack frame叫做栈帧),所以干脆用英文进行写作。
Reversing a linked-list is like the “Hello World” from programmers. In this article I’m gonna use it as an example (written below in C) to showcase how C code gets translated into assembly, with a specific focus on stack frames, register usage, memory layout, and the power of the popular debugging tool gdb
.
#include <stdio.h>
typedef struct Node {
int val;
int id;
struct Node* next;
} Node;
// Reverses a singly linked list and returns the new head
Node* reverse(Node* head) {
Node* prev = NULL;
Node* curr = head;
Node* next;
while (curr != NULL) {
next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
}
return prev;
}
// Prints the list
void print_list(Node* head) {
while (head != NULL) {
printf("Node ID: %d, Value: %d\n", head->id, head->val);
head = head->next;
}
}
int main() {
// Declare four nodes on the stack and manually connect them
Node node_4 = {40, 4, NULL};
Node node_3 = {30, 3, &node_4};
Node node_2 = {20, 2, &node_3};
Node node_1 = {10, 1, &node_2};
printf("Original list:\n");
print_list(&node_1);
// Reverse the list
Node* new_head = reverse(&node_1);
printf("\nReversed list:\n");
print_list(new_head);
return 0;
}
The code above should be pretty straightforward. Before we dig into this piece of code, let’s refresh our mind with some basic knowledge on CPU registers, memory and assembly.
Registers and Memory in a 64-bit World
When writing low-level programs like C, the compiler transforms our code into assembly instructions that run on the CPU. Each CPU has a small set of registers, which are fast-access storage units. On a 64-bit machine, these registers are 64 bits wide — i.e., they can hold a number as large as 2⁶⁴.
Computer memory is essentially a giant array of bytes. Every byte in memory has a unique address (think of it like array index), which is also a 64-bit number on modern architectures. So, each register can natually point to a particular memory address.
Notice that the values in the registers don’t always have to be memory address. They can also simply be numbers like -5, 42, etc. The CPU doesn’t care about the meaning—it just sees binary values. It’s up to us (and the compiler) to interpret them correctly. This dual interpretation — value or address — is what makes assembly programming flexible yet confusing to beginners.
Essentially all data inside a machine is just bits. When we see two bytes 0x48 0x69, we can either intepret them as string “Hi”, or as decimal number 18537, and the computer itself wouldn’t know the difference. Note that the prefix ‘0x’ means this a hex number (0x10 is 16 in decimal). You’ll run into hex numbers all the time in the low-level world, because memory addresses are 64-bit and decimal is just too clumsy - you would definitely prefer using 0x401256
to 010 000 000 001 001 001 010 110
to denote an address. Hex is compact and maps neatly to binary.
There are 16 general-purpose registers in x86-64 systems:
%rax, %rbx, %rcx, %rdx,
%rsi, %rdi, %rbp, %rsp,
%r8 through %r15
Some registers have conventional purposes:
%rsp
: stack pointer, always points to the top of the stack%rax
: return value from a function%rdi
,%rsi
,%rdx
,%rcx
,%r8
,%r9
: used to pass the first 6 function arguments
Common Assembly Instructions
Assembly instructions manipulate the data between registers and memory. These instructions are where we begin to see how higher-level constructs like linked lists are implemented under the hood.
Here are a few of the most common instructions you’ll run into all the time. These might seem overwhelming for now, but don’t worry, we will see them work in action later.
mov
: Copy value from source to destination (register-to-register, memory-to-register, etc.). For example, say register %rax is now 0x402000mov $0x28, %rax
updates register %rax to value 0x28.mov $0x28, (%rax)
puts integer 0x28 into memory starting at address 0x402000
lea
: Load effective address. Usually used to perform simple arthmetic operations.- Say %rax is 0x402000, and %rbx is 0x2. Instruction
lea 0x1(%rax, %rbx, 4), %rdx
sets %rdx = 0x402000 + 2 * 4 + 0x1 = 0x402009.
- Say %rax is 0x402000, and %rbx is 0x2. Instruction
call
: Push return address onto stack and jump to the functionret
: Pop return address and jump back to where the function call was madepush
: Decrease%rsp
and store a value at the new top of the stackpop
: Opposite ofpush
: Load value from top of stack and increase%rsp
sub
,add
: Perform arithmetic (often used to grow/shrink stack frame)cmp
: Compare two valuesjmp
,je
,jne
,jg
, etc.: Control flow via unconditional or conditional jumps. Usually used together withcmp
instruction, for example,cmp $0x3, %rax
je
equal: %rax == 3jne
not equak: %rax != 3jg
greater: %rax > 3
Dissecting the main Function
I use the following code to compile the C file and then disassemble it:
gcc -g -no-pie -Og -o linked_list linked_list.c
objdump -d --no-show-raw-insn linked_list > linked_list.s
Let’s look at the disassembled version of our C program’s main
function. I’ve added lots of annotations to help reading through. At each line, the leftmost hex number is the address of the instruction.
00000000004011c5 <main>:
4011c5: endbr64 # Control-flow enforcement. Let's ignore it
4011c9: push %rbx # Register rbx will be modified in the process. We push it onto the stack so that we can retrieve later
4011ca: sub $0x50,%rsp # Move the top of the stack to allocate 80 bytes on stack(0x50 is 80 in decimal)
4011ce: mov %fs:0x28,%rax
4011d7: mov %rax,0x48(%rsp) # Put special value in the stack to detect stack buffer overflows. We can ignore for now.
4011dc: xor %eax,%eax # Clear %eax
# === Initialize 4 linked list nodes ===
4011de: movl $0x28,(%rsp) # node4.val = 40
4011e5: movl $0x4,0x4(%rsp) # node4.id = 4
4011ed: movq $0x0,0x8(%rsp) # node4.next = NULL
4011f6: movl $0x1e,0x10(%rsp) # node3.val = 30
4011fe: movl $0x3,0x14(%rsp) # node3.id = 3
401206: mov %rsp,%rax # address of node4
401209: mov %rax,0x18(%rsp) # node3.next = &node4
40120e: movl $0x14,0x20(%rsp) # node2.val = 20
401216: movl $0x2,0x24(%rsp) # node2.id = 2
40121e: lea 0x10(%rsp),%rax # address of node3
401223: mov %rax,0x28(%rsp) # node2.next = &node3
401228: movl $0xa,0x30(%rsp) # node1.val = 10
401230: movl $0x1,0x34(%rsp) # node1.id = 1
401238: lea 0x20(%rsp),%rax # address of node2
40123d: mov %rax,0x38(%rsp) # node1.next = &node2
# === Print Original List ===
401242: lea 0xdd3(%rip),%rdi # Load "Original list:" format string
401249: call 401060 <puts@plt> # Print string to our screen
40124e: lea 0x30(%rsp),%rbx # Set rbp to point to node1
401253: mov %rbx,%rdi # Set pointer to node1 as the first argument to function call `print_list` so that we're ready for the next line
401256: call 401195 <print_list>
# === Reverse the List ===
40125b: mov %rbx,%rdi # first arg: head node
40125e: call 401176 <reverse> # returns new head
401263: mov %rax,%rbx # save new head
# === Print Reversed List ===
401266: lea 0xdbe(%rip),%rdi # Load "Reversed list:" string
40126d: call 401060 <puts@plt>
401272: mov %rbx,%rdi
401275: call 401195 <print_list>
# === Stack Canary Check ===
40127a: mov 0x48(%rsp),%rax
40127f: sub %fs:0x28,%rax
401288: jne 401295 <main+0xd0> # if canary changed → stack overflow → crash
# === Return Normally ===
40128a: mov $0x0,%eax # 0 is the success return code
40128f: add $0x50,%rsp # Clean up stack frame
401293: pop %rbx # Retrieve the original value of register rbx
401294: ret # Return 0
401295: call 401070 <__stack_chk_fail@plt> # stack corrupted → crash
A very important concept is stack frame - a block of memory to use whenever a function is called. It stores everything that the function needs while it runs, such as return address (where to jump back after the function finishes), local variables, function arguments (if passed via the stack) and sometimes, saved registers (like %rbp, %rbx) to restore later. It’s created at the start of a function and destroyed at the end.
The register %rsp is the top of the stack frame, and is extremely important since it marks the boundary of the stack frame. When we say the stack frame is destroyed, we don’t mean physically destroying the hardware. Instead, it means the previous chunk of memory is no longered considered part of the current stack frame, and other programs are now allowed to write new values to this memory space to overwrite the existing ones.
Now let’s run GDB (GNU Debugger) to execute the program. add a breakpoint to address 401256 (you can find this number in the disassembled code above), right before calling print_list
function.
$ gdb linked_list
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
(gdb) break *0x401256
(gdb) run
Starting program: /home/ubuntu/linked_list
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Now let’s examine the stack frame. The GDB command to examine memory (a very powerful tool!) is x/<count><format><size> <address>
.
- codes for [format]: x = hex, d = decimal, u = unsigned, c = char, s = string
- codes for [size]: b = byte (1 byte), h = halfword (2 bytes), w = word (4 bytes), g = giant (8 bytes)
For example, the following code examines 20 words (4 bytes each) at the current stack pointer:
(gdb) x/20wx $rsp
0x7fffffffe350: 0x00000028 0x00000004 0x00000000 0x00000000
0x7fffffffe360: 0x0000001e 0x00000003 0xffffe350 0x00007fff
0x7fffffffe370: 0x00000014 0x00000002 0xffffe360 0x00007fff
0x7fffffffe380: 0x0000000a 0x00000001 0xffffe370 0x00007fff
0x7fffffffe390: 0x00000000 0x00000000 0xfbeca800 0x9629f2d4
The first 4 lines correspond to the 4 linked-list nodes we just created, and we can clearly see each node’s value, id and the pointer to next node. Each node consumes 16bytes (two integers, 4 bytes for each, and a 8-byte pointer), and that’s why their memory addresses all differ by 0x10.
Here’s how the stack looks during execution.
|------------------------| <- at 0x50(%rsp), 0x7fffffffe3a0
| stack overflow check |
|------------------------|
| |
|------------------------| <- at 0x40(%rsp), 0x7fffffffe390
| node1 (next) |
|------------------------| <- at 0x38(%rsp), 0x7fffffffe368
| node1 (val, id) |
|------------------------| <- at 0x30(%rsp), 0x7fffffffe370
| node2 (next) |
|------------------------| <- at 0x28(%rsp), 0x7fffffffe368
| node2 (val, id) |
|------------------------| <- at 0x20(%rsp), 0x7fffffffe370
| node3 (next) |
|------------------------| <- at 0x18(%rsp), 0x7fffffffe368
| node3 (val, id) |
|------------------------| <- at 0x10(%rsp), 0x7fffffffe360
| node4 (next) |
|------------------------| <- at 0x8(%rsp), 0x7fffffffe358
| node4 (val, id) |
|------------------------| <- %rsp, 0x7fffffffe350
The stack frame contains exactly 0x50 bytes of memory, as allocated by the instruction 4011ca: sub $0x50,%rsp
above. You may notice that the 8 bytes at address 0x40(%rsp) are simply unused, and you may wonder why did we bother to allocate space for it. On x86-64, the stack must be aligned to 16 bytes before a call instruction, and we need those “0x00” bytes for padding.
Dissecting the print_list Function
Now let’s resume from we set the breakpoint, and dig into the print_list
function.
0000000000401195 <print_list>:
401195: endbr64
# === Function Prologue ===
401199: push %rbx # Save %rbx
40119a: mov %rdi,%rbx # Move first arg (head pointer) into %rbx
# === Loop Condition ===
40119d: jmp 4011be # Jump to address 4011be for loop condition check
# === Loop Body ===
40119f: mov (%rbx),%ecx # Load val into %ecx
4011a1: mov 0x4(%rbx),%edx # Load id into %edx
4011a4: lea 0xe59(%rip),%rsi # Load the address of format string "Node ID: %d, Value: %d\n" into %rsi
4011ab: mov $0x1,%edi # First arg to printf (fd = stdout)
4011b0: mov $0x0,%eax # Clear %eax before variadic call
4011b5: call 401080 <__printf_chk@plt> # secure version of function printf
4011ba: mov 0x8(%rbx),%rbx # Move to next node (follow .next)
# === Loop Condition ===
4011be: test %rbx,%rbx # Check if current node is NULL
4011c1: jne 40119f # If not NULL, jump to address 40119f to continue loop
# === Function Epilogue ===
4011c3: pop %rbx # Restore %rbx
4011c4: ret # Return
We can add a second breakpoint to the function to examine the stack
(gdb) break *0x4011ba
Breakpoint 2 at 0x4011ba: file linked_list.c, line 30.
(gdb) continue
Continuing.
Node ID: 1, Value: 10
Breakpoint 2, print_list (head=head@entry=0x7fffffffe380) at linked_list.c:30
30 head = head->next;
(gdb) x/20wx $rsp
0x7fffffffe340: 0xffffe380 0x00007fff 0x0040125b 0x00000000
0x7fffffffe350: 0x00000028 0x00000004 0x00000000 0x00000000
0x7fffffffe360: 0x0000001e 0x00000003 0xffffe350 0x00007fff
We notice that the stack pointer register %rsp has been changed from 0x7fffffffe350
to 0x7fffffffe340
, even though the print_list
function doesn’t allocate any space to the stack frame. So, why did the stack frame grow by 0x10 (16bytes)?
- In the
main
function, when we execute instruction401256: call 401195 <print_list>
, we push the address of next instruction40125b: mov %rbx,%rdi
onto the stack, so that when function callprint_list
finishes, we can return to the correct location ofmain
function and continue executing. This return addreses takes 8 bytes. - Inside the
print_list
function , we run401199: push %rbx
to push the value of register %rbx onto the stack. That takes another 8 bytes.
So, even though register %rsp isn’t directly called, it was modified by instructions call
and push
. Notice that instructions ret
and pop
are counterparts of them, and will do the opposite - pop data out of the stack.
Dissecting the reverse Function
Now let’s move on to the last function reverse
.
0000000000401176 <reverse>:
401176: endbr64
# === Function Prologue ===
40117a: mov $0x0,%eax # Set return register %rax = NULL (this will track the new head)
# === Loop Condition (initial jump) ===
40117f: jmp 40118f # Jump to loop condition before first iteration
# === Loop Body ===
401181: mov 0x8(%rdi),%rdx # Load next node: %rdx = current->next
401185: mov %rax,0x8(%rdi) # Reverse the link: current->next = previous
401189: mov %rdi,%rax # Update previous: previous = current
40118c: mov %rdx,%rdi # Move to next: current = next
# === Loop Condition ===
40118f: test %rdi,%rdi # Check if current (now in %rdi) is NULL
401192: jne 401181 # If not, continue the loop
# === Function Epilogue ===
401194: ret # Return; %rax holds new head of reversed list
This piece of assembly should be rather straightforward now.
One thing interesting here is that in my C function reverse
, I declare 3 local variables prev
, curr
and next
, but in the assembly code there isn’t any stack space allocated for them. It’s because the compiler tries to optimize away unnecessary memory operations, and it’s smart enough to realize that we don’t actually need any additional memory - simply using registers to update pointers is enough. And this process is just beautiful.
Thanks
I just started learning fundamentals about machines recently, although I’ve been a developer for many years. This article is my personal attempt to internalize my learnings and share them with others. If you’re just getting into the low-level world, I hope you’re just as excited as I’m.
One last note: I’m using my M2 Macbook Air for daily task and it doesn’t work well with GDB and all those x86-64 stuff. I ended up getting an AWS ECS Ubuntu t2.micro machine.
$ uname -a
Linux ip-172-31-18-227 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0