Operating Systems: From 0 to 1

80483e1: mov eax,ds:0x804a018

int j = i;

80483e6: mov DWORD PTR [ebp-0x8],eax

int k = 0xabcdef;

80483e9: mov DWORD PTR [ebp-0x4],0xabcdef

The general data movement is performed with the mov instruction. Note that despite the instruction being called mov, it actually copies data from one destination to another.

The red instruction copies data from the register esp to the register ebp. This mov instruction moves data between registers and is assigned the opcode 89.

The blue instructions copies data from one memory location (the i variable) to another (the j variable). There exists no data movement from memory to memory; it requires two mov instructions, one for copying the data from a memory location to a register, and one for copying the data from the register to the destination memory location.

The pink instruction copies an immediate value into memory. Finally, the green instruction copies immediate data into a register.

Expressions

Source

int expr(int i, int j)
{
    int add            = i + j;
    int sub            = i - j;
    int mul            = i * j;
    int div            = i / j;
    int mod            = i % j;
    int neg            = -i;
    int and            = i & j;
    int or             = i | j;
    int xor            = i ^ j;
    int not            = ~i;
    int shl            = i << 8;
    int shr            = i >> 8;
    char equal1        = (i == j);
    int equal2         = (i == j);
    char greater       = (i > j);
    char less          = (i < j);
    char greater_equal = (i >= j);
    char less_equal    = (i <= j);
    int logical_and    = i && j;
    int logical_or     = i || j;
    ++i;
    --i;
    int i1             = i++;
    int i2             = ++i;
    int i3             = i--;
    int i4             = --i;

    return 0;
}

int main(int argc, char *argv[]) {
    return 0;
}

Assembly

The full assembly listing is really long. For that reason, we examine expression by expression.

80483e1: mov edx,DWORD PTR [ebp+0x8]

int add = i + j;

80483e4: mov eax,DWORD PTR [ebp+0xc]

80483e7: add eax,edx

80483e9: mov DWORD PTR [ebp-0x34],eax

The assembly code is straight forward: variable i and j are stored in eax and edx respectively, then added together with the add instruction, and the final result is stored into eax. Then, the result is saved into the local variable add, which is at the location [ebp-0x34].

80483ec: mov eax,DWORD PTR [ebp+0x8]

int sub = i - j;

80483ef: sub eax,DWORD PTR [ebp+0xc]

80483f2: mov DWORD PTR [ebp-0x30],eax

80483f5: mov eax,DWORD PTR [ebp+0x8]

int mul = i * j;

80483f8: imul eax,DWORD PTR [ebp+0xc]

80483fc: mov DWORD PTR [ebp-0x34],eax

Unsigned multiply is perform by mul instruction.

. eax is first loaded with i, then is multiplied with j and stored the result back into eax, then stored into the variable mul at location [ebp-0x34].

80483ff: mov eax,DWORD PTR [ebp+0x8]

int div = i / j;

8048402: cdq

8048403: idiv DWORD PTR [ebp+0xc]

8048406: mov DWORD PTR [ebp-0x30],eax

First, i is reloaded into eax.
Then, cdq converts the double word value in eax into a quadword value stored in the pair of registers edx:eax, by copying the signed (bit 31^th) of the value in eax into every bit position in edx. The pair edx:eax is the dividend, which is the variable i, and the operand to idiv is the divisor, which is the variable j.
After the calculation, the result is stored into the pair edx:eax registers, with the quotient in eax and remainder in edx. The quotient is stored in the variable div, at location [ebp-0x30].

8048409: mov eax,DWORD PTR [ebp+0x8]

int mod = i % j;

804840c: cdq

804840d: idiv DWORD PTR [ebp+0xc]

8048410: mov DWORD PTR [ebp-0x2c],edx

The same idiv instruction also performs the modulo operation, since it also calculates a remainder and stores in the variable mod, at location [ebp-0x2c].

8048413: mov eax,DWORD PTR [ebp+0x8]

int neg = -i;

8048416: neg eax

8048418: mov DWORD PTR [ebp-0x28],eax

neg replaces the value of operand (the destination operand) with its two's complement (this operation is equivalent to subtracting the operand from 0). In this example, the value i in eax is replaced replaced with -i using neg instruction. Then, the new value is stored in the variable neg at [ebp-0x28].

804841b: mov eax,DWORD PTR [ebp+0x8]

int and = i & j;

804841e: and eax,DWORD PTR [ebp+0xc]

8048421: mov DWORD PTR [ebp-0x24],eax

and performs a bitwise AND operation on two operands, and stores the result in the destination operand, which is the variable and at [ebp-0x24].

8048424: mov eax,DWORD PTR [ebp+0x8]

int or = i | j;

8048427: or eax,DWORD PTR [ebp+0xc]

804842a: mov DWORD PTR [ebp-0x20],eax

804842d: mov eax,DWORD PTR [ebp+0x8]

int xor = i ^ j;

8048430: xor eax,DWORD PTR [ebp+0xc]

8048433: mov DWORD PTR [ebp-0x1c],eax

8048436: mov eax,DWORD PTR [ebp+0x8]

int not = ~i;

8048439: not eax

804843b: mov DWORD PTR [ebp-0x18],eax

not performs a bitwise NOT operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand and stores the result in the destination operand location, which is the variable not at [ebp-0x18].

804843e: mov eax,DWORD PTR [ebp+0x8]

int shl = i << 8;

8048441: shl eax,0x8

8048444: mov DWORD PTR [ebp-0x14],eax

shl (shift logical left) shifts the bits in the destination operand to the left by the number of bits specified in the source operand. In this case, eax stores i and shl shifts eax by 8 bits to the left. A different name for shl is sal (shift arithmetic left). Both can be used synonymous. Finally, the result is stored in the variable shl at [ebp-0x14].

Here is a visual demonstration of shl/sal and shr instructions:

After shifting to the left, the right most bit is set for Carry Flag in EFLAGS register.

8048447: mov eax,DWORD PTR [ebp+0x8]

int shr = i >> 8;

804844a: sar eax,0x8

804844d: mov DWORD PTR [ebp-0x10],eax

In the figure (b), notice that initially, the sign bit is 1, but after 1-bit and 10-bit shiftings, the shifted-out bits are filled with zeros.

Figure 0.14: SAR Instruction Operation (Source: Figure 7-8, Volume 1)

With sar, the sign bit (the most significant bit) is preserved. That is, if the sign bit is 0, the new bits always get the value 0; if the sign bit is 1, the new bits always get the value 1.

8048450: mov eax,DWORD PTR [ebp+0x8]

char equal1 = (i == j);

8048453: cmp eax,DWORD PTR [ebp+0xc]

8048456: sete al

8048459: mov BYTE PTR [ebp-0x41],al

cmp and variants of the variants of set instructions make up all the logical comparisons. In this expression, cmp compares variable i and j; then sete stores the value 1 to al register if the comparison from cmp earlier is equal, or stores 0 otherwise. The general name for variants of set instruction is called SETcc. The suffix cc denotes the condition being tested for in EFLAGS register. Appendix B in volume 1, “EFLAGS Condition Codes”, lists the conditions it is possible to test for with this instruction. Finally, the result is stored in the variable equal1 at [ebp-0x41].

804845c: mov eax,DWORD PTR [ebp+0x8]

int equal2 = (i == j);

804845f: cmp eax,DWORD PTR [ebp+0xc]

8048462: sete al

8048465: movzx eax,al

8048468: mov DWORD PTR [ebp-0xc],eax

Figure 0.15: movzx instruction

Sub-Figure a: eax before movzx

Sub-Figure b: after movzx eax, al

804846b: mov eax,DWORD PTR [ebp+0x8]

char greater = (i > j);

804846e: cmp eax,DWORD PTR [ebp+0xc]

8048471: setg al

8048474: mov BYTE PTR [ebp-0x40],al

8048477: mov eax,DWORD PTR [ebp+0x8]

char less = (i < j);

804847a: cmp eax,DWORD PTR [ebp+0xc]

804847d: setl al

8048480: mov BYTE PTR [ebp-0x3f],al

Applied setl for less comparison.

char greater_equal = (i >= j);

8048483: mov eax,DWORD PTR [ebp+0x8]

8048486: cmp eax,DWORD PTR [ebp+0xc]

8048489: setge al

804848c: mov BYTE PTR [ebp-0x3e],al

Applied setge for greater or equal comparison.

char less_equal = (i <= j);

804848f: mov eax,DWORD PTR [ebp+0x8]

8048492: cmp eax,DWORD PTR [ebp+0xc]

8048495: setle al

8048498: mov BYTE PTR [ebp-0x3d],al

Applied setle for less than or equal comparison.

int logical_and = (i && j);

804849b: cmp DWORD PTR [ebp+0x8],0x0

804849f: je 80484ae <expr+0xd3>

80484a1: cmp DWORD PTR [ebp+0xc],0x0

80484a5: je 80484ae <expr+0xd3>

80484a7: mov eax,0x1

80484ac: jmp 80484b3 <expr+0xd8>

80484ae: mov eax,0x0

80484b3: mov DWORD PTR [ebp-0x8],eax

Logical AND operator && is one of the syntaxes that is made entirely in software

That is, there is no equivalent assembly instruction implemented in hardware.

with simpler instructions. The algorithm from the assembly code is simple:

First, check if i is 0 with the instruction at 0x804849b.
1. If true, jump to 0x80484ae and set eax to 0.
2. Set the variable logical_and to 0, as it is the next instruction after 0x80484ae.
If i is not 0, check if j is 0 with the instruction at 0x80484a1.
1. If true, jump to 0x80484ae and set eax to 0.
2. Set the variable logical_and to 0, as it is the next instruction after 0x80484ae.
If both i and j are not 0, the result is certainly 1, or true.
1. Set it accordingly with the instruction at 0x80484a7.
2. Then jump to the instruction at 0x80484b3 to set the variable logical_and at [ebp-0x8] to 1.

int logical_or = (i || j);

80484b6: cmp DWORD PTR [ebp+0x8],0x0

80484ba: jne 80484c2 <expr+0xe7>

80484bc: cmp DWORD PTR [ebp+0xc],0x0

80484c0: je 80484c9 <expr+0xee>

80484c2: mov eax,0x1

80484c7: jmp 80484ce <expr+0xf3>

80484c9: mov eax,0x0

80484ce: mov DWORD PTR [ebp-0x4],eax

Logical OR operator || is similar to logical and above. Understand the algorithm is left as an exercise for readers.

++i; and --i; (or i++ and i--)

80484d1: add DWORD PTR [ebp+0x8],0x1

80484d5: sub DWORD PTR [ebp+0x8],0x1

The syntax of increment and decrement is similar to logical AND and logical OR in that it is made from existing instruction, that is add. The difference is that the CPU actually does has a built-in instruction, but gcc decided not to use the instruction because inc and dec cause a partial flag register stall, occurs when an instruction modifies a part of the flag register and the following instruction is dependent on the outcome of the flags (section 3.5.2.6, Intel Optimization Manual, 2016). The manual even suggests that inc and dec should be replaced with add and sub instructions (section 3.5.1.1, Intel Optimization Manual, 2016).

Expression:

int i1 = i++;

80484d9: mov eax,DWORD PTR [ebp+0x8]

80484dc: lea edx,[eax+0x1]

80484df: mov DWORD PTR [ebp+0x8],edx

80484e2: mov DWORD PTR [ebp-0x10],eax

First, i is copied into eax at 80484d9. Then, the value of eax + 0x1 is copied into edx as an effective address at 80484dc. The lea (load effective address) instruction copies a memory address into a register. According to Volume 2, the source operand is a memory address specified with one of the processors addressing modes. This means, the source operand must be specified by the addressing modes defined in 16-bit/32-bit ModR/M Byte tables, 3. and 3..

After loading the incremented value into edx, the value of i is increased by 1 at 80484df. Finally, the previous i value is stored back to i1 at [ebp-0x8] by the instruction at 80484e2.

Expression:

int i2 = ++i;

80484e5: add DWORD PTR [ebp+0x8],0x1

80484e9: mov eax,DWORD PTR [ebp+0x8]

80484ec: mov DWORD PTR [ebp-0xc],eax

The primary differences between this increment syntax and the previous one are:

add is used instead of lea to increase i directly.
the newly incremented i is stored into i2 instead of the old value.
the expression only costs 3 instructions instead of 4.

This prefix-increment syntax is faster than the post-fix one used previously. It might not matter much which version to use if the increment is only used once or a few hundred times in a small loop, but it matters when a loop runs millions or more times. Also, depends on different circumstances, it is more convenient to use one over the other e.g. if i is an index for accessing an array, we want to use the old value for accessing previous array element and newly incremented i for current element.

Expression:

int i3 = i--;

80484ef: mov eax,DWORD PTR [ebp+0x8]

80484f2: lea edx,[eax-0x1]

80484f5: mov DWORD PTR [ebp+0x8],edx

80484f8: mov DWORD PTR [ebp-0x8],eax

Similar to i++ syntax, and is left as an exercise to readers.

Expression:

int i4 = --i;

80484fb: sub DWORD PTR [ebp+0x8],0x1

80484ff: mov eax,DWORD PTR [ebp+0x8]

8048502: mov DWORD PTR [ebp-0x4],eax

Similar to ++i syntax, and is left as an exercise to readers.

Exercise 0.7. Read section 3.5.2.4, “Partial Register Stalls” to understand register stalls in general.

Exercise 0.8. Read the sections from 7.3.1 to 7.3.7 in volume 1.

Stack

A stack is a contiguous array of memory locations that holds a collection of discrete data. When a new element is added, a stack grows down in memory toward lesser addresses, and shrinks up toward greater addresses when an element is removed. x86 uses the esp register to point to the top of the stack, at the newest element. A stack can be originated anywhere in main memory, as esp can be set to any memory address. x86 provides two operations for manipulating stacks:

push instruction and its variants add a new element on top of the stack
pop instructions and its variants remove the top-most element from the stack.

0x10000	00
0x10001	00
0x10002	00
0x10003	00
0x10004	12	$\leftarrow$	esp

0x10000	00
0x10001	00
0x10002	78	$\leftarrow$	esp
0x10003	56
0x10004	12

0x10000	00
0x10001	00
0x10002	00
0x10003	00
0x10004	12	$\leftarrow$	esp

Automatic variables

Local variables are variables that exist within a scope. A scope is delimited by a pair of braces: {..}. The most common scope to define local variables is at function scope. However, scope can be unnamed, and variables created inside an unnamed scope do not exist outside of its scope and its inner scope.

Example 0.17. Function scope:

void foo() {
    int a;
    int b;
}

a and b are variables local to the function foo.

Example 0.18. Unnamed scope:

int foo() {
    int i;

    {
        int a = 1;
        int b = 2;
        {
            return i = a + b;
        }
    }
}

a and b are local to where it is defined and local into its inner child scope that return i = a + b. However, they do not exist at the function scope that creates i.

When a local variable is created, it is pushed on the stack; when a local variable goes out of scope, it is pop out of the stack, thus destroyed. When an argument is passed from a caller to a callee, it is pushed on the stack; when a callee returns to the caller, the arguments are popped out the stack. The local variables and arguments are automatically allocated upon enter a function and destroyed after exiting a function, that's why it's called automatic variables.

A base frame pointer points to the start of the current function frame, and is kept in ebp register. Whenever a function is called, it is allocated with its own dedicated storage on stack, called stack frame. A stack frame is where all local variables and arguments of a function are placed on a stack

Data and only data are exclusively allocated on stack for every stack frame. No code resides here.

When a function needs a local variable or an argument, it uses ebp to access a variable:

All local variables are allocated after the ebp pointer. Thus, to access a local variable, a number is subtracted from ebp to reach the location of the variable.
All arguments are allocated before ebp pointer. To access an argument, a number is added to ebp to reach the location of the argument.
The ebp itself pointer points to the return address of its caller.

Previous Frame

Current Frame

Function Arguments

ebp

Local variables

........

Return Address

Old ebp

........

A = Argument

L = Local Variable

Here is an example to make it more concrete:

Source

int add(int a, int b) {
    int i = a + b;

    return i;
}

Assembly

080483db <add>:

#include <stdint.h>

int add(int a, int b) {

80483db: push ebp

80483dc: mov ebp,esp

80483de: sub esp,0x10

int i = a + b;

80483e1: mov edx,DWORD PTR [ebp+0x8]

80483e4: mov eax,DWORD PTR [ebp+0xc]

80483e7: add eax,edx

80483e9: mov DWORD PTR [ebp-0x4],eax

return i;

80483ec: mov eax,DWORD PTR [ebp-0x4]

}

80483ef: leave

80483f0: ret

In the assembly listing, [ebp-0x4] is the local variable i, since it is allocated after ebp, with the length of 4 bytes (an int). On the other hand, a and b are arguments and can be accessed with ebp:

[ebp+0x8] accesses a.
[ebp+0xc] access b.

For accessing arguments, the rule is that the closer a variable on stack to ebp, the closer it is to a function name.

ebp+0xc

ebp+0x8

ebp+0x4

ebp

↓

↓

↓

↓

0x10000

Return Address

Old ebp

ebp+0x8

ebp+0x4

↓

↓

0xffe0

N = Next local variable starts here

From the figure, we can see that a and b are laid out in memory with the exact order as written in C, relative to the return address.

Function Call and Return

Source

#include <stdio.h>

int add(int a, int b) {
    int local = 0x12345;

    return a + b;
}

int main(int argc, char *argv[]) {
    add(1,1);

    return 0;
}

Assembly

For every function call, gcc pushes arguments on the stack in reversed order with the push instructions. That is, the arguments pushed on stack are in reserved order as it is written in high level C code, to ensure the relative order between arguments, as seen in previous section how function arguments and local variables are laid out. Then, gcc generates a call instruction, which then implicitly pushes a return address before transferring the control to add function:

080483f2 <main>:

80483f9: call 80483db <add>

Upon finishing the call to add function, the stack is restored by adding 0x8 to stack pointer esp (which is equivalent to 2 pop instructions). Finally, a leave instruction is executed and main returns with a ret instruction. A ret instruction transfers the program execution back to the caller to the instruction right after the call instruction, the add instruction. The reason ret can return to such location is that the return address implicitly pushed by the call instruction, which is the address right after the call instruction; whenever the CPU executes ret instruction, it retrieves the return address that sits right after all the arguments on the stack:

At the end of a function, gcc places a leave instruction to clean up all spaces allocated for local variables and restore the frame pointer to frame pointer of the caller.

080483db <add>:

#include <stdio.h>

int add(int a, int b) {

80483e1: DWORD PTR [ebp-0x4],0x12345

int local = 0x12345;

return a + b;

80483e8: mov edx,DWORD PTR [ebp+0x8]

80483eb: mov eax,DWORD PTR [ebp+0xc]

Exercise 0.9. The above code that gcc generated for function calling is actually the standard method x86 defined. Read chapter 6, “Produce Calls, Interrupts, and Exceptions”, Intel manual volume 1.

Loop

Loop is simply resetting the instruction pointer to an already executed instruction and starting from there all over again. A loop is just one application of jmp instruction. However, because looping is a pervasive pattern, it earned its own syntax in C.

Source

#include <stdio.h>

int main(int argc, char *argv[]) {
    for (int i = 0; i < 10; i++) {
    }

    return 0;
}

Assembly

080483db <main>:

#include <stdio.h>

for (int i = 0; i < 10; i++) {

80483e1: mov DWORD PTR [ebp-0x4],0x0

80483e8: jmp 80483ee <main+0x13>

80483ea: add DWORD PTR [ebp-0x4],0x1

80483ee: cmp DWORD PTR [ebp-0x4],0x9

80483f2: jle 80483ea <main+0xf>

80483f4: b8 00 00 00 00 mov eax,0x0

return 0;

80483fb: 66 90 xchg ax,ax

80483f9: c9 leave

80483fa: c3 ret

80483fd: 66 90 xchg ax,ax

80483ff: 90 nop

The colors mark corresponding high level code to assembly code:

The red instruction initialize i to 0.
The green instructions compare i to 10 by using jle and compare it to 9. If true, jump to 80483ea for another iteration.
The blue instruction increase i by 1, making the loop able to terminate once the terminate condition is satisfied.

Exercise 0.10. Why does the increment instruction (the blue instruction) appears before the compare instructions (the green instructions)?

Exercise 0.11. What assembly code can be generated for while and do...while?

Conditional

Again, conditional in C with if...else... construct is just another application of jmp instruction under the hood. It is also a pervasive pattern that earned its own syntax in C.

Source

#include <stdio.h>

int main(int argc, char *argv[]) {
    int i = 0;

    if (argc) {
        i = 1;
    } else {
        i = 0;
    }

    return 0;
}

Assembly

80483e1: mov DWORD PTR [ebp-0x4],0x0

int i = 0;

if (argc) {

80483e8: cmp DWORD PTR [ebp+0x8],0x0

80483ec: je 80483f7 <main+0x1c>

i = 1;

80483ee: mov DWORD PTR [ebp-0x4],0x1

80483f5: jmp 80483fe <main+0x23>

} else {

i = 0;

80483f7: mov DWORD PTR [ebp-0x4],0x0

The generated assembly code follows the same order as the corresponding high level syntax:

red instructions represents if branch.
blue instructions represents else branch.
green instruction is the exit point for both if and else branch.

if branch first compares whether argc is false (equal to 0) with cmp instruction. If true, it proceeds to else branch at 80483f7. Otherwise, if branch continues with the code of its branch, which is the next instruction at 80483ee for copying 1 to i. Finally, it skips over else branch and proceeds to 80483fe, which is the next instruction pasts the if..else... construct.

else branch is entered when cmp instruction from if branch is true. else branch starts at 80483f7, which is the first instruction of else branch. The instruction copies 0 to i, and proceeds naturally to the next instruction pasts the if...else... construct without any jump.

The Anatomy of a Program

Every program consists of code and data, and only those two components made up a program. However, if a program consists purely code and data of its own, from the perspective of an operating system (as well as human), it does not know in a program, which block of binary is a program and which is just raw data, where in the program to start execution, which region of memory should be protected and which is free to modify. For that reason, each program carries extra metadata to communicate with the operating system how to handle the program.

When a source file is compiled, the generated machine code is stored into an

object file

object file, which is just a block of binary. One or more object files can be combined to produce an

executable binary

executable binary, which is a complete program runnable in an operating system.

readelf is a program that recognizes and displays the ELF metadata of a binary file, be it an object file or an executable binary. ELF, or Executable and Linkable Format, is the content at the very beginning of an executable to provide an operating system necessary information to load into main memory and run the executable. ELF can be thought of similar to the table of contents of a book. In a book, a table of contents list the page numbers of the main sections, subsections, sometimes even figures and tables for easy lookup. Similarly, ELF lists various sections used for code and data, and the memory addresses of each symbol along with other information.

An ELF binary is composed of:

An ELF header

ELF header: the very first section of an executable that describes the file's organization.

A program header table

program header table: is an array of fixed-size structures that describes segments of an executable.

A section header table

section header table: is an array of fixed-size structures that describes sections of an executable.

Segments and sections

Segments and sections are the main content of an ELF binary, which are the code and data, divided into chunks of different purposes.

A segment is a composition of zero or more sections and is directly loaded by an operating system at runtime.

A section is a block of binary that is either:
- actual program code and data that is available in memory when a program runs.
- metadata about other sections used only in the linking process, and disappear from the final executable.
Linker uses sections to build segments.

Figure 0.16: ELF - Linking View vs Executable View (Source: Wikipedia)

Later we will compile our kernel as an ELF executable with GCC, and explicitly specify how segments are created and where they are loaded in memory through the use a linker script, a text file to instruct how a linker should generate a binary. For now, we will examine the anatomy of an ELF executable in detail.

Reference documents:

The

ELF specification

ELF specification is bundled as a man page in Linux:

$ man elf

It is a useful resource to understand and implement ELF. However, it will be much easier to use after you finish this chapter, as the specification mixes implementation details in it.

The default specification is a generic one, in which every ELF implementation follows. However, each platform provides extra features unique to it. The ELF specification for x86 is currently maintained on Github by H.J. Lu: https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.

Platform-dependent details are referred to as “processor specific” in the generic ELF specification. We will not explore these details, but study the generic details, which are enough for crafting an ELF binary image for our operating system.

ELF header

To see the information of an ELF header:

$ readelf -h hello

The output:

Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00

Class: ELF64

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: UNIX - System V

ABI Version: 0

Type: EXEC (Executable file)

Machine: Advanced Micro Devices X86-64

Version: 0x1

Entry point address: 0x400430

Start of program headers: 64 (bytes into file)

Start of section headers: 6648 (bytes into file)

Flags: 0x0

Size of this header: 64 (bytes)

Size of program headers: 56 (bytes)

Number of program headers: 9

Size of section headers: 64 (bytes)

Number of section headers: 31

Section header string table index: 28

Let's go through each field:

Magic

Displays the raw bytes that uniquely addresses a file is an ELF executable binary. Each byte gives a brief information.

In the example, we have the following magic bytes:

Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00

Examine byte by byte:

Byte	Description
`7f 45 4c 46`	Predefined values. The first byte is always `7F`, the remaining 3 bytes represent the string `“ELF`”.

`02`	See `Class` field below.

`01`	See `Data` field below.

`01`	See `Version` field below.

`00`	See `OS/ABI` field below.

`00 00 00 00 00 00 00 00`	Padding bytes. These bytes are unused and are always set to 0. Padding bytes are added for proper alignment, and is reserved for future use when more information is needed.

Class

A byte in Magic field. It specifies the class or capacity of a file.

Possible values:

Value	Description
`0`	Invalid class
`1`	32-bit objects
`2`	64-bit objects

Data

A byte in Magic field. It specifies the data encoding of the processor-specific data in the object file.

Possible values:

Value	Description
`0`	Invalid data encoding
`1`	Little endian, 2's complement
`2`	Big endian, 2's complement

Version

A byte in Magic. It specifies the ELF header version number.

Possible values:

Value	Description
0	Invalid version
1	Current version

OS/ABI

A byte in Magic field. It specifies the target operating system ABI. Originally, it was a padding byte.

Possible values: Refer to the latest ABI document, as it is a long list of different operating systems.

Type

Identifies the object file type.

Value	Description
`0`	No file type
`1`	Relocatable file
`2`	Executable file
`3`	Shared object file
`4`	Core file
`0xff00`	Processor specific, lower bound
`0xffff`	Processor specific, upper bound

The values from 0xff00 to 0xffff are reserved for a processor to define additional file types meaningful to it.

Machine

Specifies the required architecture value for an ELF file e.g. x86_64, MIPS, SPARC, etc. In the example, the machine is of x86_64 architecture.

Possible values: Please refer to the latest ABI document, as it is a long list of different architectures.

Version

Specifies the version number of the current object file (not the version of the ELF header, as the above Version field specified).

Entry point address

Specifies the memory address where the very first code to be executed. The address of main function is the default in a normal application program, but it can be any function by explicitly specifying the function name to gcc. For the operating system we are going to write, this is the single most important field that we need to retrieve to bootstrap our kernel, and everything else can be ignored.

Start of program headers

The offset of the program header table, in bytes. In the example, this number is 64 bytes, which means the 65th byte, or <start address> + 64, is the start address of the program header table. That is, if a program is loaded at address 0x10000 in memory, then the start address is 0x10000 (the very first byte of Magic field, where the value 0x7f resides) and the start address of program header table is 0x10000 + 0x40 = 0x10040.

Start of section headers

The offset of the section header table in bytes, similar to the start of program headers. In the example, it is 6648 bytes into file.

Flags

Hold processor-specific flags associated with the file. When the program is loaded, in a x86 machine, EFLAGS register is set according to this value. In the example, the value is 0x0, which means EFLAGS register is in a clear state.

Size of this header

Specifies the total size of ELF header's size in bytes. In the example, it is 64 bytes, which is equivalent to Start of program headers. Note that these two numbers are not necessary equivalent, as program header table might be placed far away from the ELF header. The only fixed component in the ELF executable binary is the ELF header, which appears at the very beginning of the file.

Size of program headers

Specifies the size of each program header in bytes. In the example, it is 64 bytes.

Number of program headers

Specifies the total number of program headers. In the example, the file has a total of 9 program headers.

Size of section headers

Specifies the size of each section header in bytes. In the example, it is 64 bytes.

Number of section headers

Specifies the total number of section headers. In the example, the file has a total of 31 section headers. In a section header table, the first entry in the table is always an empty section.

Section header string table index

Specifies the index of the header in the section header table that points to the section that holds all null-terminated strings. In the example, the index is 28, which means it's the 28^th entry of the table.

Section header table

As we know already, code and data compose a program. However, not all types of code and data have the same purpose. For that reason, instead of a big chunk of code and data, they are divided into smaller chunks, and each chunk must satisfy these conditions (according to gABI):

Every section in an object file has exactly one section header describing it. But, section headers may exist that do not have a section.
Each section occupies one contiguous (possibly empty) sequence of bytes within a file. That means, there's no two regions of bytes that are the same section.
Sections in a file may not overlap. No byte in a file resides in more than one section.
An object file may have inactive space. The various headers and the sections might not “cover” every byte in an object file. The contents of the inactive data are unspecified.

To get all the headers from an executable binary e.g. hello, use the following command:

$ readelf -S hello

Here is a sample output (do not worry if you don't understand the output. Just skim to get your eyes familiar with it. We will dissect it soon enough):

There are 31 section headers, starting at offset 0x19c8:

[ 0] NULL 0000000000000000 00000000

0000000000000000 0000000000000000 0 0 0

[ 1] .interp PROGBITS 0000000000400238 00000238

000000000000001c 0000000000000000 A 0 0 1

[ 2] .note.ABI-tag NOTE 0000000000400254 00000254

0000000000000020 0000000000000000 A 0 0 4

[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274

0000000000000024 0000000000000000 A 0 0 4

[ 4] .gnu.hash GNU_HASH 0000000000400298 00000298

000000000000001c 0000000000000000 A 5 0 8

[ 5] .dynsym DYNSYM 00000000004002b8 000002b8

0000000000000048 0000000000000018 A 6 1 8

[ 6] .dynstr STRTAB 0000000000400300 00000300

0000000000000038 0000000000000000 A 0 0 1

[ 7] .gnu.version VERSYM 0000000000400338 00000338

0000000000000006 0000000000000002 A 5 0 2

[ 8] .gnu.version_r VERNEED 0000000000400340 00000340

0000000000000020 0000000000000000 A 6 1 8

[ 9] .rela.dyn RELA 0000000000400360 00000360

0000000000000018 0000000000000018 A 5 0 8

[10] .rela.plt RELA 0000000000400378 00000378

0000000000000018 0000000000000018 AI 5 24 8

[11] .init PROGBITS 0000000000400390 00000390

000000000000001a 0000000000000000 AX 0 0 4

[12] .plt PROGBITS 00000000004003b0 000003b0

0000000000000020 0000000000000010 AX 0 0 16

[13] .plt.got PROGBITS 00000000004003d0 000003d0

0000000000000008 0000000000000000 AX 0 0 8

[14] .text PROGBITS 00000000004003e0 000003e0

0000000000000192 0000000000000000 AX 0 0 16

[15] .fini PROGBITS 0000000000400574 00000574

0000000000000009 0000000000000000 AX 0 0 4

[16] .rodata PROGBITS 0000000000400580 00000580

0000000000000004 0000000000000004 AM 0 0 4

[17] .eh_frame_hdr PROGBITS 0000000000400584 00000584

000000000000003c 0000000000000000 A 0 0 4

[18] .eh_frame PROGBITS 00000000004005c0 000005c0

0000000000000114 0000000000000000 A 0 0 8

[19] .init_array INIT_ARRAY 0000000000600e10 00000e10

0000000000000008 0000000000000000 WA 0 0 8

[20] .fini_array FINI_ARRAY 0000000000600e18 00000e18

0000000000000008 0000000000000000 WA 0 0 8

[21] .jcr PROGBITS 0000000000600e20 00000e20

0000000000000008 0000000000000000 WA 0 0 8

[22] .dynamic DYNAMIC 0000000000600e28 00000e28

00000000000001d0 0000000000000010 WA 6 0 8

[23] .got PROGBITS 0000000000600ff8 00000ff8

0000000000000008 0000000000000008 WA 0 0 8

[24] .got.plt PROGBITS 0000000000601000 00001000

0000000000000020 0000000000000008 WA 0 0 8

[25] .data PROGBITS 0000000000601020 00001020

0000000000000010 0000000000000000 WA 0 0 8

[26] .bss NOBITS 0000000000601030 00001030

0000000000000008 0000000000000000 WA 0 0 1

[27] .comment PROGBITS 0000000000000000 00001030

0000000000000034 0000000000000001 MS 0 0 1

[28] .shstrtab STRTAB 0000000000000000 000018b6

000000000000010c 0000000000000000 0 0 1

[29] .symtab SYMTAB 0000000000000000 00001068

0000000000000648 0000000000000018 30 47 8

[30] .strtab STRTAB 0000000000000000 000016b0

0000000000000206 0000000000000000 0 0 1

W (write), A (alloc), X (execute), M (merge), S (strings), l (large)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

The first line:

There are 31 section headers, starting at offset 0x19c8

summarizes the total number of sections in the file, and where the address where it starts. Then, comes the listing section by section with the following header, is also the format of each section output:

Each section has two lines with different fields:

Nr: The index of each section.
Name: The name of each section.
Type: This field (in a section header) identifies the type of each section. Types are used to classify sections.
Address: The starting virtual address of each section. Note that the addresses are virtual only when a program runs in an OS with support for virtual memory enabled. In our OS, we run on the bare metal, the addresses will all be physical.

Table 4: Section Flags

Flag	Descriptions
W	Bytes in this section are writable during execution.
A	Memory is allocated for this section during process execution. Some control sections do not reside in the memory image of an object file; this attribute is off for those sections.
X	The section contains executable instructions.
M	The data in the section may be merged to eliminate duplication. Each element in the section is compared against other elements in sections with the same name, type and flags. Elements that would have identical values at program run-time may be merged.
S	The data elements in the section consist of null-terminated character strings. The size of each character is specified in the section header's *EntSize* field.
l	Specific large section for x86_64 architecture. This flag is not specified in the Generic ABI but in x86_64 ABI.
I	The Info field of this section header holds an index of a section header. Otherwise, the number is the index of something else.
L	Preserve section ordering when linking. If this section is combined with other sections in the output file, it must appear in the same relative order with respect to those sections, as the linked-to section appears with respect to sections the linked-to section is combined with. Apply when the Link field of this section's header references another section (the linked-to section)
G	This section is a member (perhaps the only one) of a section group.
T	This section holds Thread-Local Storage, meaning that each thread has its own distinct instance of this data. A thread is a distinct execution flow of code. A program can have multiple threads that pack different pieces of code and execute separately, at the same time. We will learn more about threads when writing our kernel.
E	Link editor is to exclude this section from executable and shared library that it builds when those objects are not to be further relocated.
x	Unknown flag to readelf. It happens because the linking process can be done manually with a linker like GNU ld (we will later later). That is, section flags can be specified manually, and some flags are for a customized ELF that the open-source readelf doesn't know of.
O	This section requires special OS-specific processing (beyond the standard linking rules) to avoid incorrect behavior. A link editor encounters sections whose headers contain OS-specific values it does not recognize by Type or Flags values defined by ELF standard, the link editor should combine those sections.
o	All bits included in this flag are reserved for operating system-specific semantics.
p	All bits included in this flag are reserved for processor-specific semantics. If meanings are specified, the processor supplement explains them.

Link and Info: are numbers that references the indexes of sections, symbol table entries, hash table entries. Link field only holds the index of a section, while Info field holds an index of a section, a symbol table entry or a hash table entry, depends on the type of a section.

Later when writing our OS, we will handcraft the kernel image by explicitly linking the object files (produced by gcc) through a linker script. We will specify the memory layout of sections by specifying at what addresses they will appear in the final image. But we will not assign any section flag and let the linker take care of it. Nevertheless, knowing which flag does what is useful.
Align: is a value that enforces the offset of a section should be divisible by the value. Only 0 and positive integral powers of two are allowed. Values 0 and 1 mean the section has no alignment constraint.

Example 0.19. Output of .interp section:

[ 1] .interp PROGBITS 0000000000400238 00000238

000000000000001c 0000000000000000 A 0 0 1

Nr is 1.

Type is PROGBITS, which means this section is part of the program.

Address is 0x0000000000400238, which means the program is loaded at this virtual memory address at runtime.

Offset is 0x00000238 bytes into file.

Size is 0x000000000000001c in bytes.

EntSize is 0, which means this section does not have any fixed-size entry.

Flags are A (Allocatable), which means this section consumes memory at runtime.

Info and Link are 0 and 0, which means this section links to no section or entry in any table.

Align is 1, which means no alignment.

Example 0.20. Output of the .text section:

[14] .text PROGBITS 00000000004003e0 000003e0

0000000000000192 0000000000000000 AX 0 0 16

Nr is 14.

Type is PROGBITS, which means this section is part of the program.

Address is 0x00000000004003e0, which means the program is loaded at this virtual memory address at runtime.

Offset is 0x000003e0 bytes into file.

Size is 0x0000000000000192 in bytes.

EntSize is 0, which means this section does not have any fixed-size entry.

Flags are A (Allocatable) and X (Executable), which means this section consumes memory and can be executed as code at runtime.

Info and Link are 0 and 0, which means this section links to no section or entry in any table.

Align is 16, which means the starting address of the section should be divisible by 16, or 0x10. Indeed, it is:

0 x 3 e 0 / 0 x 10 = 0 x 3 e

Understand Section in-depth

In this section, we will learn different details of section types and the purposes of special sections e.g. .bss, .text, .data, etc, by looking at each section one by one. We will also examine the content of each section as a hexdump with the commands:

$ readelf -x <section name|section number> <file>

For example, if you want to examine the content of section with index 25 (the .bss section in the sample output) in the file hello:

$ readelf -x 25 hello

Equivalently, using name instead of index works:

$ readelf -x .data hello

If a section contains strings e.g. string symbol table, the flag -x can be replaced with -p.

NULL

marks a section header as inactive and does not have an associated section. NULL section is always the first entry of section header table. It means, any useful section starts from 1.

Example 0.21. The sample output of NULL section:

[ 0] NULL 0000000000000000 00000000

0000000000000000 0000000000000000 0 0 0

Examining the content, the section is empty:

Section '' has no data to dump.

NOTE

marks a section with special information that other programs will check for conformance, compatibility, etc, by a vendor or a system builder.

Example 0.22. In the sample output, we have 2 NOTE sections:

[ 2] .note.ABI-tag NOTE 0000000000400254 00000254

0000000000000020 0000000000000000 A 0 0 4

[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274

0000000000000024 0000000000000000 A 0 0 4

Examine 2nd section with the command:

$ readelf -x 2 hello

we have:

Hex dump of section '.note.ABI-tag':

0x00400254 04000000 10000000 01000000 474e5500 ............GNU.

0x00400264 00000000 02000000 06000000 20000000 ............ ...

PROGBITS

indicates a section holding the main content of a program, either code or data.

Example 0.23. There are many PROGBITS sections:

[ 1] .interp PROGBITS 0000000000400238 00000238

000000000000001c 0000000000000000 A 0 0 1

...

[11] .init PROGBITS 0000000000400390 00000390

000000000000001a 0000000000000000 AX 0 0 4

[12] .plt PROGBITS 00000000004003b0 000003b0

0000000000000020 0000000000000010 AX 0 0 16

[13] .plt.got PROGBITS 00000000004003d0 000003d0

0000000000000008 0000000000000000 AX 0 0 8

[14] .text PROGBITS 00000000004003e0 000003e0

0000000000000192 0000000000000000 AX 0 0 16

[15] .fini PROGBITS 0000000000400574 00000574

0000000000000009 0000000000000000 AX 0 0 4

[16] .rodata PROGBITS 0000000000400580 00000580

0000000000000004 0000000000000004 AM 0 0 4

[17] .eh_frame_hdr PROGBITS 0000000000400584 00000584

000000000000003c 0000000000000000 A 0 0 4

[18] .eh_frame PROGBITS 00000000004005c0 000005c0

0000000000000114 0000000000000000 A 0 0 8

...

[23] .got PROGBITS 0000000000600ff8 00000ff8

0000000000000008 0000000000000008 WA 0 0 8

[24] .got.plt PROGBITS 0000000000601000 00001000

0000000000000020 0000000000000008 WA 0 0 8

[25] .data PROGBITS 0000000000601020 00001020

0000000000000010 0000000000000000 WA 0 0 8

[27] .comment PROGBITS 0000000000000000 00001030

0000000000000034 0000000000000001 MS 0 0 1

For our operating system, we only need the following section:

.text: This section holds all the compiled code of a program.
.data: This section holds the initialized data of a program. Since the data are initialized with actual values, gcc allocates the section with actual byte in the executable binary.
.rodata: This section holds read-only data, such as fixed-size strings in a program, e.g. “Hello World”, and others.
.bss: This section, shorts for Block Started by Symbol, holds uninitialized data of a program. Unlike other sections, no space is allocated for this section in the image of the executable binary on disk. The section is allocated only when the program is loaded into main memory.

Other sections are mainly needed for dynamic linking, that is code linking at runtime for sharing between many programs. To enable such feature, an OS as a runtime environment must be presented. Since we run our OS on bare metal, we are effectively creating such environment. For simplicity, we won't add dynamic linking to our OS.

SYMTAB and DYNSYM

These sections hold symbol table. A symbol table is an array of entries that describe symbols in a program. A symbol is a name assigned to an entity in a program. The types of these entities are also the types of symbols, and these are the possible types of an entity:

Example 0.24. In the sample output, section 5 and 29 are symbol tables:

[ 5] .dynsym DYNSYM 00000000004002b8 000002b8

0000000000000048 0000000000000018 A 6 1 8

...

[29] .symtab SYMTAB 0000000000000000 00001068

0000000000000648 0000000000000018 30 47 8

To show the symbol table:

$ readelf -s hello

Output consists of 2 symbol tables, corresponding to the two sections above, .dynsym and .symtab:

Symbol table '.dynsym' contains 4 entries:

0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND

1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2)

2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)

3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__

Symbol table '.symtab' contains 67 entries:

..........................................

59: 0000000000601040 0 NOTYPE GLOBAL DEFAULT 26 _end

60: 0000000000400430 42 FUNC GLOBAL DEFAULT 14 _start

61: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 26 __bss_start

62: 0000000000400526 32 FUNC GLOBAL DEFAULT 14 main

63: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses

64: 0000000000601038 0 OBJECT GLOBAL HIDDEN 25 __TMC_END__

65: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable

66: 00000000004003c8 0 FUNC GLOBAL DEFAULT 11 _init

TLS The

symbol is associated with a Thread-Local Storage entity.

Num

is the index of an entry in a table.

Value

is the virtual memory address where the symbol is located.

Size

is the size of the entity associated with a symbol.

Type

is a symbol type according to table.

NOTYPE: The type of a symbol is not specified.
OBJECT The: symbol is associated with a data object. In C, any variable definition is of OBJECT type.
FUNC: The symbol is associated with a function or other executable code.
SECTION The: symbol is associated with a section, and exists primarily for relocation.
FILE: The symbol is the name of a source file associated with an executable binary.
COMMON The: symbol labels an uninitialized variable. That is, when a variable in C is defined as global variable without an initial value, or as an external variable using the extern keyword. In other words, these variables stay in .bss section.

Bind

is the scope of a symbol.

LOCAL

are symbols that are only visible in the object files that defined them. In C, the static modifier marks a symbol (e.g. a variable/function) as local to only the file that defines it.

Example 0.25. If we define variables and functions with static modifer:

static int global_static_var = 0;

static void local_func() {
}

int main(int argc, char *argv[])
{
    static int local_static_var = 0;

    return 0;
}

Then we get the static variables listed as local symbols after compiling:

$ gcc -m32 hello.c -o hello

$ readelf -s hello

Symbol table '.dynsym' contains 5 entries:

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

1: 00000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.0 (2)

2: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__

3: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.0 (2)

4: 080484bc 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used

Symbol table '.symtab' contains 72 entries:

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

38: 0804a020 4 OBJECT LOCAL DEFAULT 26 global_static_var

39: 0804840b 6 FUNC LOCAL DEFAULT 14 local_func

40: 0804a024 4 OBJECT LOCAL DEFAULT 26 local_static_var.1938

symbols that are accessible by other object files when linking together. These symbols are primarily non-static functions and non-static global data. The extern modifier marks a symbol as externally defined elsewhere but is accessible in the final executable binary, so an extern variable is also considered GLOBAL.

GLOBAL are

Example 0.26. Similar to the LOCAL example above, the output lists many GLOBAL symbols such as main:

66: 080483e1 10 FUNC GLOBAL DEFAULT 14 main

are symbols whose definitions can be redefined. Normally, a symbol with multiple definitions are reported as an error by a compiler. However, this constraint is lax when a definition is explicitly marked as weak, which means the default implementation can be replaced by a different definition at link time.

WEAK

Example 0.27. Suppose we have a default implementation of the function add:

#include <stdio.h>

__attribute__((weak)) int add(int a, int b) {
    printf("warning: function is not implemented.\n");
    return 0;
}

int main(int argc, char *argv[])
{
    printf("add(1,2) is %d\n", add(1,2));
    return 0;
}

__attribute__((weak)) is a

function attribute

function attribute. A function attribute is extra information for a compiler to handle a function differently from a normal function. In this example, weak attribute makes the function add a weak function,which means the default implementation can be replaced by a different definition at link time. Function attribute is a feature of a compiler, not standard C.

If we do not supply a different function definition in a different file (must be in a different file, otherwise gcc reports as an error), then the default implementation is applied. When the function add is called, it only prints the message: "warning: function not implemented"and returns 0:

$ ./hello

warning: function is not implemented.

add(1,2) is 0

However, if we supply a different definition in another file e.g. math.c:

int add(int a, int b) {
    return a + b;
}

and compile the two files together:

$ gcc math.c hello.c -o hello

Then, when running hello, no warning message is printed and the correct value is returned.

Weak symbol is a mechanism to provide a default implementation, but replaceable when a better implementation is available (e.g. more specialized and optimized) at link-time.

Vis

is the visibility of a symbol. The following values are available:

Table 5: Symbol Visibility

Value	Description
DEFAULT	The visibility is specified by the binding type of asymbol. Global and weak symbols are visible outside of their defining component (executable file or shared object). Local symbols are hidden. See HIDDEN below.
HIDDEN	A symbol is hidden when the name is not visible to any other program outside of its running program.
PROTECTED	A symbol is protected when it is shared outside of its running program or shared libary and cannot be overridden. That is, there can only be one definition for this symbol across running programs that use it. No program can define its own definition of the same symbol.
INTERNAL	Visibility is processor-specific and is defined by processor-specific ABI.

Ndx

is the index of a section that the symbol is in. Aside from fixed index numbers that represent section indexes, index has these special values:

Table 6: Symbol Index

Value	Description
ABS	The index will not be changed by any symbol relocation.
COM	The index refers to an unallocated common block.
UND	The symbol is undefined in the current object file, which means the symbol depends on the actual definition in another file. Undefined symbols appears when the object file refers to symbols that are available at runtime, from shared library.
LORESERVE HIRESERVE	LORESERVE is the lower boundary of the reserve indexes. Its value is 0xff00. HIREVERSE is the upper boundary of the reserve indexes. Its value is 0xffff. The operating system reserves exclusive indexes between LORESERVE and HIRESERVE, which do not map to any actual section header.
XINDEX	The index is larger than LORESERVE. The actual value will be contained in the section SYMTAB_SHNDX, where each entry is a mapping between a symbol, whose Ndx field is a XINDEX value, and the actual index value.
Others	Sometimes, values such as ANSI_COM, LARGE_COM, SCOM, SUND appear. This means that the index is processor-specific.

Name

is the symbol name.

Example 0.28. A C application program always starts from symbol main. The entry for main in the symbol table in .symtab section is:

62: 0000000000400526 32 FUNC GLOBAL DEFAULT 14 main

The entry shows that:

main is the 62^th entry in the table.
main starts at address 0x0000000000400526.
main consumes 32 bytes.
main is a function.
main is in global scope.
main is visible to other object files that use it.
main is inside the 14^th section, which is .text. This is logical, since .text holds all program code.

STRTAB

hold a table of null-terminated strings, called string table. The first and last byte of this section is always a NULL character. A string table section exists because a string can be reused by more than one section to represent symbol and section names, so a program like readelf or objdump can display various objects in a program, e.g. variable, functions, section names, in a human-readable text instead of its raw hex address.

Example 0.29. In the sample output, section 28 and 30 are of STRTAB type:

[28] .shstrtab STRTAB 0000000000000000 000018b6

000000000000010c 0000000000000000 0 0 1

[30] .strtab STRTAB 0000000000000000 000016b0

0000000000000206 0000000000000000 0 0 1

.shstrtab: holds all the section names.
.strtab: holds the symbols e.g. variable names, function names, struct names, etc., in a C program, but not fixed-size null-terminated C strings; the C strings are kept in .rodata section.

Example 0.30. Strings in those section can be inspected with the command:

$ readelf -p 29 hello

The output shows all the section names, with the offset (also the string index) into .shstrtab the table to the left:

String dump of section '.shstrtab':

[ 31] .note.gnu.build-id

The actual implementation of a string table is a contiguous array of null-terminated strings. The index of a string is the position of its first character in the array. For example, in the above string table, .symtab is at index 1 in the array (NULL character is at index 0). The length of .symtab is 7, plus the NULL character, which occurs 8 bytes in total. So, .strtab starts at index 9, and so on.

00000000

00000010

.... and so on ....

Similarly, the output of .strtab:

String dump of section '.strtab':

[ 1] crtstuff.c

[ c] __JCR_LIST__

[ 19] deregister_tm_clones

[ 2e] __do_global_dtors_aux

[ 44] completed.7585

[ 53] __do_global_dtors_aux_fini_array_entry

[ 7a] frame_dummy

[ 86] __frame_dummy_init_array_entry

[ a5] hello.c

[ ad] __FRAME_END__

[ bb] __JCR_END__

[ c7] __init_array_end

[ d8] _DYNAMIC

[ e1] __init_array_start

[ f4] __GNU_EH_FRAME_HDR

[ 107] _GLOBAL_OFFSET_TABLE_

[ 11d] __libc_csu_fini

[ 12d] _ITM_deregisterTMCloneTable

[ 149] j

[ 14b] _edata

[ 152] __libc_start_main@@GLIBC_2.2.5

[ 171] __data_start

[ 17e] __gmon_start__

[ 18d] __dso_handle

[ 19a] _IO_stdin_used

[ 1a9] __libc_csu_init

[ 1b9] __bss_start

[ 1c5] main

[ 1ca] _Jv_RegisterClasses

[ 1de] __TMC_END__

[ 1ea] _ITM_registerTMCloneTable

HASH

holds a symbol hash table, which supports symbol table access.

DYNAMIC

holds information for dynamic linking.

NOBITS

Example 0.31. .bss section holds uninitialized data, which means the bytes in the section can have any value. Until a operating system actually loads the section into main memory, there is no need to allocate space for the binary image on disk to reduce the size of a binary file. Here is the details of .bss from the example output:

[26] .bss NOBITS 0000000000601038 00001038

0000000000000008 0000000000000000 WA 0 0 1

[27] .comment PROGBITS 0000000000000000 00001038

0000000000000034 0000000000000001 MS 0 0 1

In the above output, the size of the section is only 8 bytes, while the offsets of both sections are the same, which means .bss consumes no byte of the executable binary on disk.

Notice that the .comment section has no starting address. This means that this section is discarded when the executable binary is loaded into memory.

REL

holds relocation entries without explicit addends. This type will be explained in details in 7

RELA

holds relocation entries with explicit addends. This type will be explained in details in 7

INIT_ARRAY

is an array of function pointers for program initialization. When an application program runs, before getting to main(), initialization code in .init and this section are executed first. The first element in this array is an ignored function pointer.

It might not make sense when we can include initialization code in the main() function. However, for shared object files where there are no main(), this section ensures that the initialization code from an object file executes before any other code to ensure a proper environment for main code to run properly. It also makes an object file more modularity, as the main application code needs not to be responsible for initializing a proper environment for using a particular object file, but the object file itself. Such a clear division makes code cleaner.

However, we will not use any .init and INIT_ARRAY sections in our operating system, for simplicity, as initializing an environment is part of the operating-system domain.

Example 0.32. To use the INIT_ARRAY, we simply mark a function with the attribute constructor:

#include <stdio.h>

__attribute__((constructor)) static void init1(){
    printf("%s\n", __FUNCTION__);
}

__attribute__((constructor)) static void init2(){
    printf("%s\n", __FUNCTION__);
}


int main(int argc, char *argv[])
{
    printf("hello world\n");

    return 0;
}

The program automatically calls the constructor without explicitly invoking it:

$ gcc -m32 hello.c -o hello

$ ./hello

init1

init2

hello world

Example 0.33. Optionally, a constructor can be assigned with a priority from 101 onward. The priorities from 0 to 100 are reserved for gcc. If we want init2 to run before init1, we give it a higher priority:

#include <stdio.h>

__attribute__((constructor(102))) static void init1(){
    printf("%s\n", __FUNCTION__);
}

__attribute__((constructor(101))) static void init2(){
    printf("%s\n", __FUNCTION__);
}


int main(int argc, char *argv[])
{
    printf("hello world\n");

    return 0;
}

The call order should be exactly as specified:

$ gcc -m32 hello.c -o hello

$ ./hello

init2

init1

hello world

Example 0.34. We can add initialization functions using another method:

#include <stdio.h>

void init1() {
    printf("%s\n", __FUNCTION__);
}

void init2() {
    printf("%s\n", __FUNCTION__);
}

/* Without typedef, init is a definition of a function pointer.
   With typedef, init is a declaration of a type.*/
typedef void (*init)();

__attribute__((section(".init_array"))) init init_arr[2] = {init1, init2};

int main(int argc, char *argv[])
{
    printf("hello world!\n");

    return 0;
}

The attribute section(“...”) put a function into a particular section rather than the default .text. In this example, it is .init_array. The section name is not necessary the same as the standard header in an ELF file (such as .text or .init_array, but can be anything. Non-standard section names are often used for controlling the final binary layout of a compiled program. We will explore this techinque in more details when learning the GNU ld linker and the linking process. Again, the program automatically calls the constructors without explicitly invoking it:

$ gcc -m32 hello.c -o hello

$ ./hello

init1

init2

hello world!

FINI_ARRAY

is an array of function pointers for program termination, called after exiting main(). If the application terminate abnormally, such as through abort() call or a crash, the .finit_array is ignored.

Example 0.35. A destructor is automatically called after exiting main(), if one or more available:

#include <stdio.h>

__attribute__((destructor)) static void destructor(){
    printf("%s\n", __FUNCTION__);
}

int main(int argc, char *argv[])
{
    printf("hello world\n");

    return 0;
}

$ gcc -m32 hello.c -o hello

$ ./hello

hello world

destructor

PREINIT_ARRAY

is an array of function pointers that are invoked before all other initialization functions in INIT_ARRAY.

Example 0.36. To use the .preinit_array, the only way to put functions into this section is to use the attribute section():

#include <stdio.h>

void preinit1() {
    printf("%s\n", __FUNCTION__);
}

void preinit2() {
    printf("%s\n", __FUNCTION__);
}

void init1() {
    printf("%s\n", __FUNCTION__);
}

void init2() {
    printf("%s\n", __FUNCTION__);
}


typedef void (*preinit)();
typedef void (*init)();

__attribute__((section(".init_array"))) preinit preinit_arr[2] = {preinit1, preinit2};
__attribute__((section(".init_array"))) init init_arr[2] = {init1, init2};

int main(int argc, char *argv[])
{
    printf("hello world!\n");

    return 0;
}

$ gcc -m32 hello2.c -o hello2

$ ./hello2

preinit1

preinit2

init1

init2

hello world!

GROUP

defines a section group, which is the same section that appears in different object files but when merged into the final executable binary file, only one copy is kept and the rest in other object files are discarded. This section is only relevant in C++ object files, so we will not examine further.

SYMTAB_SHNDX

is a section containing extended section indexes, that are associated with a symbol table. This section only appears when the Ndx value of an entry in the symbol table exceeds the LORESERVE value. This section then maps between a symbol and an actual index value of a section header.

Upon understanding section types, we can understand the number in Link and Info fields:

Type	Link	Info
DYNAMIC	Entries in this section uses the section index of the dynamic string table.	0
HASH GNU_HASH	The section index of the symbol table to which the hash table applies.	0
REL RELA	The section index of the associated symbol table.	The section index to which the relocation applies.
SYMTAB DYNSYM	The section index of the associated string table.	One greater than the symbol table index of the last local symbol.
GROUP	The section index of the associated symbol table.	The symbol index of an entry in the associated symbol table. The name of the specified symbol table entry provides a signature for the section group.
SYMTAB_SHNDX	The section header index of the associated symbol table.

Exercise 0.12. Verify that the value of the Link field of a SYMTAB section is the index of a STRTAB section.

Exercise 0.13. Verify that the value of the Info field of a SYMTAB section is the index of last local symbol + 1. It means, in the symbol table, from the index listed by Info field onward, no local symbol appears.

Exercise 0.14. Verify that the value of the Info field of a REL section is the index of the SYMTAB section.

Exercise 0.15. Verify that the value of the Link field of a REL section is the index of the section where relocation is applied. For example. if the section is .rel.text, then the relocating section should be .text.

Program header table

A program header table is an array of program headers that defines the memory layout of a program at runtime.

A program header is a description of a program segment.

A program segment is a collection of related sections. A segment contains zero or more sections. An operating system when loading a program, only use segments, not sections. To see the information of a program header table, we use the -l option with readelf:

$ readelf -l <binary file>

PHDR: specifies the location and size of the program header table itself, both in the file and in the memory image of the program
INTERP: specifies the location and size of a null-terminated path name to invoke as an interpreter for linking runtime libraries.
LOAD: specifies a loadable segment. That is, this segment is loaded into main memory.
DYNAMIC: specifies dynamic linking information.
NOTE: specifies the location and size of auxiliary information.
TLS: specifies the Thread-Local Storage template, which is formed from the combination of all sections with the flag TLS.
GNU_STACK: indicates whether the program's stack should be made executable or not. Linux kernel uses this type.

A segment also has permission, which is a combination of these 3 values:

MarginTable 4: Segment Permission

Permission	Description
R	Readable
W	Writable
E	Executable

Read (R)
Write (W)
Execute (E)

Example 0.37. The command to get the program header table:

$ readelf -l hello

Output:

There are 9 program headers, starting at offset 64

Entry point 0x400430

Type Offset VirtAddr PhysAddr

FileSiz MemSiz Flags Align

PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040

0x00000000000001f8 0x00000000000001f8 R E 8

INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238

0x000000000000001c 0x000000000000001c R 1

[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000

0x000000000000070c 0x000000000000070c R E 200000

LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10

0x0000000000000228 0x0000000000000230 RW 200000

DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28

0x00000000000001d0 0x00000000000001d0 RW 8

NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254

0x0000000000000044 0x0000000000000044 R 4

GNU_EH_FRAME 0x00000000000005e4 0x00000000004005e4 0x00000000004005e4

0x0000000000000034 0x0000000000000034 R 4

GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000

0x0000000000000000 0x0000000000000000 RW 10

GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10

0x00000000000001f0 0x00000000000001f0 R 1

02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr

01 .interp

.gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini

.rodata .eh_frame_hdr .eh_frame

03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss

04 .dynamic

05 .note.ABI-tag .note.gnu.build-id

06 .eh_frame_hdr

08 .init_array .fini_array .jcr .dynamic .got

In the sample output, LOAD segment appears twice:

LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000

0x000000000000070c 0x000000000000070c R E 200000

LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10

0x0000000000000228 0x0000000000000230 RW 200000

Why? Notice the permission:

the upper LOAD has Read and Execute permission. This is a text segment. A text segment contains read-only instructions and read-only data.
the lower LOAD has Read and Write permission. This is a data segment. It means that this segment can be read and written to, but is not allowed to be used as executable code, for security reason.

Then, LOAD contains the following sections:

02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr

.gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini

.rodata .eh_frame_hdr .eh_frame

03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss

The first number is the index of a program header in program header table, and the remaining text is the list of all sections within a segment. Unfortunately, readelf does not print the index, so a user needs to keep track manually which segment is of which index. First segment starts at index 0, second at index 1 and so on. LOAD are segments at index 2 and 3. As can be seen from the two lists of sections, most sections are loadable and is available at runtime.

Segments vs sections

As mentioned earlier, an operating system loads program segments, not sections. However, a question arises: Why doesn't the operating system use sections instead? After all, a section also contains similar information to a program segment, such as the type, the virtual memory address to be loaded, the size, the attributes, the flags and align. As explained before, a segment is the perspective of an operating system, while a section is the perspective of a linker. To understand why, looking into the structure of a segment, we can easily see:

A segment is a collection of sections. It means that sections are logically grouped together by their attributes. For example, all sections in a LOAD segment are always loaded by the operating system; all sections have the same permission, either a RE (Read + Execute) for executable sections, or RW (Read + Write) for data sections.
By grouping sections into a segment, it is easier for an operating system to batch load sections just once by loading the start and end of a segment, instead of loading section by section.
Since a segment is for loading a program and a section is for linking a program, all the sections in a segment is within its start and end virtual memory addresses of a segment.

To see the last point clearer, consider an example of linking two object files. Suppose we have two source files:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello World\n");
    return 0;
}

and:

int add(int a, int b) {
    return a + b;
}

Now, compile the two source files as object files:

$ gcc -m32 -c math.c

$ gcc -m32 -c hello.c

Then, we check the sections of math.o:

$ readelf -S math.o

There are 11 section headers, starting at offset 0x1a8:

[ 1] .text PROGBITS 00000000 000034 00000d 00 AX 0 0 1

[ 2] .data PROGBITS 00000000 000041 000000 00 WA 0 0 1

[ 3] .bss NOBITS 00000000 000041 000000 00 WA 0 0 1

[ 4] .comment PROGBITS 00000000 000041 000035 01 MS 0 0 1

[ 5] .note.GNU-stack PROGBITS 00000000 000076 000000 00 0 0 1

[ 6] .eh_frame PROGBITS 00000000 000078 000038 00 A 0 0 4

[ 7] .rel.eh_frame REL 00000000 00014c 000008 08 I 9 6 4

[ 8] .shstrtab STRTAB 00000000 000154 000053 00 0 0 1

[ 9] .symtab SYMTAB 00000000 0000b0 000090 10 10 8 4

[10] .strtab STRTAB 00000000 000140 00000c 00 0 0 1

W (write), A (alloc), X (execute), M (merge), S (strings)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

As shown in the output, all the section virtual memory addresses of every section are set to 0. At this stage, each object file is simply a block of binary that contains code and data. Its existence is to serve as a material container for the final product, which is the executable binary. As such, the virtual addresses in hello.o are all zeroes.

No segment exists at this stage:

$ readelf -l math.o

There are no program headers in this file.

The same happens to other object file:

There are 13 section headers, starting at offset 0x224:

[ 1] .text PROGBITS 00000000 000034 00002e 00 AX 0 0 1

[ 2] .rel.text REL 00000000 0001ac 000010 08 I 11 1 4

[ 3] .data PROGBITS 00000000 000062 000000 00 WA 0 0 1

[ 4] .bss NOBITS 00000000 000062 000000 00 WA 0 0 1

[ 5] .rodata PROGBITS 00000000 000062 00000c 00 A 0 0 1

[ 6] .comment PROGBITS 00000000 00006e 000035 01 MS 0 0 1

[ 7] .note.GNU-stack PROGBITS 00000000 0000a3 000000 00 0 0 1

[ 8] .eh_frame PROGBITS 00000000 0000a4 000044 00 A 0 0 4

[ 9] .rel.eh_frame REL 00000000 0001bc 000008 08 I 11 8 4

[10] .shstrtab STRTAB 00000000 0001c4 00005f 00 0 0 1

[11] .symtab SYMTAB 00000000 0000e8 0000b0 10 12 9 4

[12] .strtab STRTAB 00000000 000198 000013 00 0 0 1

W (write), A (alloc), X (execute), M (merge), S (strings)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

$ readelf -l hello.o

There are no program headers in this file.

Only when object files are combined into a final executable binary, sections are fully realized:

$ gcc -m32 math.o hello.o -o hello

$ readelf -S hello.

There are 31 section headers, starting at offset 0x1804:

[ 1] .interp PROGBITS 08048154 000154 000013 00 A 0 0 1

[ 2] .note.ABI-tag NOTE 08048168 000168 000020 00 A 0 0 4

[ 3] .note.gnu.build-i NOTE 08048188 000188 000024 00 A 0 0 4

[ 4] .gnu.hash GNU_HASH 080481ac 0001ac 000020 04 A 5 0 4

[ 5] .dynsym DYNSYM 080481cc 0001cc 000050 10 A 6 1 4

[ 6] .dynstr STRTAB 0804821c 00021c 00004a 00 A 0 0 1

[ 7] .gnu.version VERSYM 08048266 000266 00000a 02 A 5 0 2

[ 8] .gnu.version_r VERNEED 08048270 000270 000020 00 A 6 1 4

[ 9] .rel.dyn REL 08048290 000290 000008 08 A 5 0 4

[10] .rel.plt REL 08048298 000298 000010 08 AI 5 24 4

[11] .init PROGBITS 080482a8 0002a8 000023 00 AX 0 0 4

[12] .plt PROGBITS 080482d0 0002d0 000030 04 AX 0 0 16

[13] .plt.got PROGBITS 08048300 000300 000008 00 AX 0 0 8

[14] .text PROGBITS 08048310 000310 0001a2 00 AX 0 0 16

[15] .fini PROGBITS 080484b4 0004b4 000014 00 AX 0 0 4

[16] .rodata PROGBITS 080484c8 0004c8 000014 00 A 0 0 4

[17] .eh_frame_hdr PROGBITS 080484dc 0004dc 000034 00 A 0 0 4

[18] .eh_frame PROGBITS 08048510 000510 0000ec 00 A 0 0 4

[19] .init_array INIT_ARRAY 08049f08 000f08 000004 00 WA 0 0 4

[20] .fini_array FINI_ARRAY 08049f0c 000f0c 000004 00 WA 0 0 4

[21] .jcr PROGBITS 08049f10 000f10 000004 00 WA 0 0 4

[22] .dynamic DYNAMIC 08049f14 000f14 0000e8 08 WA 6 0 4

[23] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4

[24] .got.plt PROGBITS 0804a000 001000 000014 04 WA 0 0 4

[25] .data PROGBITS 0804a014 001014 000008 00 WA 0 0 4

[26] .bss NOBITS 0804a01c 00101c 000004 00 WA 0 0 1

[27] .comment PROGBITS 00000000 00101c 000034 01 MS 0 0 1

[28] .shstrtab STRTAB 00000000 0016f8 00010a 00 0 0 1

[29] .symtab SYMTAB 00000000 001050 000470 10 30 48 4

[30] .strtab STRTAB 00000000 0014c0 000238 00 0 0 1

W (write), A (alloc), X (execute), M (merge), S (strings)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

Every loadable section is assigned an address, highlighted in green. The reason each section got its own address is that in reality, gcc does not combine an object by itself, but invokes the linker ld. The linker ld uses the default script that it can find in the system to build the executable binary. In the default script, a segment is assigned a starting address 0x8048000 and sections belong to it. Then:

$1^{s t} s e c t i o n a d d r e s s = s t a r t i n g s e g m e n t a d d r e s s + s e c t i o n o f f s e t = 0 x 8048000 + 0 x 154 = 0 x 08048154$
$2^{n d} s e c t i o n a d d r e s s = s t a r t i n g s e g m e n t a d d r e s s + s e c t i o n o f f s e t = 0 x 8048000 + 0 x 168 = 0 x 08048168$
and so on until the last loadable section.

Indeed, the end address of a segment is also the end address of the final section. We can see this by listing all the segments:

$ readelf -l hello

And check, for example, LOAD segment which starts at 0x08048000 and end at

0 x 08048000 + 0 x 005 f c = 0 x 080485 f c

There are 9 program headers, starting at offset 52

Entry point 0x8048310

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4

INTERP 0x000154 0x08048154 0x08048154 0x00013 0x00013 R 0x1

[Requesting program interpreter: /lib/ld-linux.so.2]

LOAD 0x000000 0x08048000 0x08048000 0x005fc 0x005fc R E 0x1000

LOAD 0x000f08 0x08049f08 0x08049f08 0x00114 0x00118 RW 0x1000

DYNAMIC 0x000f14 0x08049f14 0x08049f14 0x000e8 0x000e8 RW 0x4

NOTE 0x000168 0x08048168 0x08048168 0x00044 0x00044 R 0x4

GNU_EH_FRAME 0x0004dc 0x080484dc 0x080484dc 0x00034 0x00034 R 0x4

GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10

GNU_RELRO 0x000f08 0x08049f08 0x08049f08 0x000f8 0x000f8 R 0x1

02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr

01 .interp

.gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .plt.got .text .fini

.rodata .eh_frame_hdr .eh_frame

03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss

04 .dynamic

05 .note.ABI-tag .note.gnu.build-id

06 .eh_frame_hdr

08 .init_array .fini_array .jcr .dynamic .got

The last section in the first LOAD segment is .eh_frame. The .eh_frame section starts at 0x0804851 because the start address is 0x08048000, the offset into the file is 0x510. The end address of .eh_frame should be:

0 x 08048000 + 0 x 510 + 0 x e c = 0 x 080485 f c

because the segment size is 0xec. This is exactly the same as the end address of the first LOAD segment above:

0 x 08048000 + 0 x 5 e c = 0 x 080485 f c

Chapter 7 will explore this whole process in detail.

Runtime inspection and debug

A
debugger

debugger is a program that allows inspection of a running program. A debugger can start and run a program then stop at a specific line for examining the state of the program at that point. The point where the debugger stop (but not halt) is called a breakpoint.

We will be using the GDB - GNU Debugger for debugging our kernel. gdb is the program name. gdb can do four main kinds of things:

Start your program, specifying anything that might affect its behavior.
Make your program stop on specified conditions.
Examine what has happened, when your program has stopped
Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another

A sample program

There must be an existing program for debugging. The good old “Hello World” program suffices for the educational purpose in this chapter:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello World!\n");
    return 0;
}

We compile it with debugging information with the option -g:

$ gcc -m32 -g hello.c -o hello

Finally, we start gdb with the program as argument:

$ gdb hello

Static inspection of a program

Before inspecting a program at runtime, gdb loads it first. Upon loading into memory (but without running), a lot of useful information can be retrieve for inspection. The commands in this section can be used before the program runs. However, they are also usable when the program runs and can display even more information.

Command: info target/info file/info files

This command prints the information of the target being debugged. A target is the debugging program.

Example 0.38. The output of the command from hello program, a local target in detail:

(gdb) info target

Symbols from "/tmp/hello".

Local exec file:

`/tmp/hello', file type elf32-i386.

Entry point: 0x8048310

0x08048154 - 0x08048167 is .interp

0x08048168 - 0x08048188 is .note.ABI-tag

0x08048188 - 0x080481ac is .note.gnu.build-id

0x080481ac - 0x080481cc is .gnu.hash

0x080481cc - 0x0804821c is .dynsym

0x0804821c - 0x08048266 is .dynstr

0x08048266 - 0x08048270 is .gnu.version

0x08048270 - 0x08048290 is .gnu.version_r

0x08048290 - 0x08048298 is .rel.dyn

0x08048298 - 0x080482a8 is .rel.plt

0x080482a8 - 0x080482cb is .init

0x080482d0 - 0x08048300 is .plt

0x08048300 - 0x08048308 is .plt.got

0x08048310 - 0x080484a2 is .text

0x080484a4 - 0x080484b8 is .fini

0x080484b8 - 0x080484cd is .rodata

0x080484d0 - 0x080484fc is .eh_frame_hdr

0x080484fc - 0x080485c8 is .eh_frame

0x08049f08 - 0x08049f0c is .init_array

0x08049f0c - 0x08049f10 is .fini_array

0x08049f10 - 0x08049f14 is .jcr

0x08049f14 - 0x08049ffc is .dynamic

0x08049ffc - 0x0804a000 is .got

0x0804a000 - 0x0804a014 is .got.plt

0x0804a014 - 0x0804a01c is .data

0x0804a01c - 0x0804a020 is .bss

The output displayed reports:

Path of a symbol file. A symbol file is the file that contains the debugging information. Usually, this is the same file as the binary, but it is common to separate between an executable binary and its debugging information into 2 files, especially for remote debugging. In the example, it is this line:

Symbols from "/tmp/hello".
The path of the debugging program and its file type. In the example, it is this line:

Local exec file:

`/tmp/hello', file type elf32-i386.
The entry point to the debugging program. That is, the very first code the program runs. In the example, it is this line:

Entry point: 0x8048310
A list of sections with its starting and ending addresses. In the example, it is the remaining output.

Example 0.39. If the debugging program runs in a different machine, it is a remote target and gdb only prints a brief information:

(gdb) info target

Remote serial target in gdb-specific protocol:

Debugging a target over a serial line.

Command: maint info sections

This command is similar to info target but give extra information about program sections, specifically the file offset and the flags of each section.

Example 0.40. Here is the output when running against hello program:

(gdb) maint info sections

Exec file:

`/tmp/hello', file type elf64-x86-64.

[0] 0x00400238->0x00400254 at 0x00000238: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS

[1] 0x00400254->0x00400274 at 0x00000254: .note.ABI-tag ALLOC LOAD READONLY DATA HAS_CONTENTS

[2] 0x00400274->0x00400298 at 0x00000274: .note.gnu.build-id ALLOC LOAD READONLY DATA HAS_CONTENTS

[3] 0x00400298->0x004002b4 at 0x00000298: .gnu.hash ALLOC LOAD READONLY DATA HAS_CONTENTS

[4] 0x004002b8->0x00400318 at 0x000002b8: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS

[5] 0x00400318->0x00400355 at 0x00000318: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS

[6] 0x00400356->0x0040035e at 0x00000356: .gnu.version ALLOC LOAD READONLY DATA HAS_CONTENTS

[7] 0x00400360->0x00400380 at 0x00000360: .gnu.version_r ALLOC LOAD READONLY DATA HAS_CONTENTS

....remaining output omitted....

The output is similar to info target, but with more details. Next to the section names are the section flags, which are attributes of a section. Here, we can see that the sections with LOAD flag are from LOAD segment. The command can be combined with the section flags for filtered outputs:

ALLOBJ

displays sections for all loaded object files, including shared libraries. Shared libraries are only displayed when the program is already running.

section names

displays only named sections.

Example 0.41. The command:

(gdb) maint info sections .text .data .bss

only displays .text, .data and .bss sections:

Exec file:

`/tmp/hello', file type elf64-x86-64.

[13] 0x00400430->0x004005c2 at 0x00000430: .text ALLOC LOAD READONLY CODE HAS_CONTENTS

[24] 0x00601028->0x00601038 at 0x00001028: .data ALLOC LOAD DATA HAS_CONTENTS

[25] 0x00601038->0x00601040 at 0x00001038: .bss ALLOC

section-flags

displays only sections with specified section flags. Note that these section flags are specific to gdb, though it is based on the section attributes defined previously. Currently, gdb understands the following flags:

ALLOC: Section will have space allocated in the process when loaded. Set for all sections except those containing debug information.
LOAD: Section will be loaded from the file into the child process memory. Set for pre-initialized code and data, clear for .bss sections.
RELOC: Section needs to be relocated before loading.
READONLY: Section cannot be modified by the child process.
CODE: Section contains executable code only.
DATA: Section contains data only (no executable code).
ROM: Section will reside in ROM.
CONSTRUCTOR: Section contains data for constructor/destructor lists.
HAS_CONTENTS: Section is not empty.
NEVER_LOAD: An instruction to the linker to not output the section.
COFF_SHARED_LIBRARY: A notification to the linker that the section contains COFF shared library information. COFF is an object file format, similar to ELF. While ELF is the file format for an executable binary, COFF is the file format for an object file.
IS_COMMON: Section contains common symbols.

Example 0.42. We can restrict the output to only display sections that contain code with the command:

(gdb) maint info sections CODE

The output:

Exec file:

`/tmp/hello', file type elf64-x86-64.

[10] 0x004003c8->0x004003e2 at 0x000003c8: .init ALLOC LOAD READONLY CODE HAS_CONTENTS

[11] 0x004003f0->0x00400420 at 0x000003f0: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS

[12] 0x00400420->0x00400428 at 0x00000420: .plt.got ALLOC LOAD READONLY CODE HAS_CONTENTS

[13] 0x00400430->0x004005c2 at 0x00000430: .text ALLOC LOAD READONLY CODE HAS_CONTENTS

[14] 0x004005c4->0x004005cd at 0x000005c4: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS

Command: info functions

This commands list all function names and their loaded addresses. The names can be filtered with a regular expression.

Example 0.43. Run the command, we get the following output:

(gdb) info functions

All defined functions:

File hello.c:

int main(int, char **);

Non-debugging symbols:

0x00000000004003c8 _init

0x0000000000400400 puts@plt

0x0000000000400410 __libc_start_main@plt

0x0000000000400430 _start

0x0000000000400460 deregister_tm_clones

0x00000000004004a0 register_tm_clones

0x00000000004004e0 __do_global_dtors_aux

0x0000000000400500 frame_dummy

0x0000000000400550 __libc_csu_init

0x00000000004005c0 __libc_csu_fini

0x00000000004005c4 _fini

Command: info variables

This command lists all global and static variable names, or filtered with a regular expression.

Example 0.44. If we add a global variable int i into the sample source program and recompile then run the command, we get the following output:

(gdb) info variables

All defined variables:

File hello.c:

int i;

Non-debugging symbols:

0x00000000004005d0 _IO_stdin_used

0x00000000004005e4 __GNU_EH_FRAME_HDR

0x0000000000400708 __FRAME_END__

0x0000000000600e10 __frame_dummy_init_array_entry

0x0000000000600e10 __init_array_start

0x0000000000600e18 __do_global_dtors_aux_fini_array_entry

0x0000000000600e18 __init_array_end

0x0000000000600e20 __JCR_END__

0x0000000000600e20 __JCR_LIST__

0x0000000000600e28 _DYNAMIC

0x0000000000601000 _GLOBAL_OFFSET_TABLE_

0x0000000000601028 __data_start

0x0000000000601028 data_start

0x0000000000601030 __dso_handle

0x000000000060103c __bss_start

0x000000000060103c _edata

0x000000000060103c completed

0x0000000000601040 __TMC_END__

0x0000000000601040 _end

Command: disassemble/disas

This command displays the assembly code of the executable file.

Example 0.45. gdb can display the assembly code of a function:

(gdb) disassemble main

0x0804842c <+33>: mov eax,0x0

Example 0.46. It would be more useful if source is included:

(gdb) disassemble /s main

0x0804842c <+33>: mov eax,0x0

Now the high level source (in green text) is included as part of the assembly dump. Each line is backed by the corresponding assembly code below it.

Example 0.47. If the option /r is added, raw instructions in hex are included, just like how objdump displays assembly code by default:

(gdb) disassemble /rs main

0x0804840b <+0>: 8d 4c 24 04 lea ecx,[esp+0x4]

0x0804840f <+4>: 83 e4 f0 and esp,0xfffffff0

0x08048412 <+7>: ff 71 fc push DWORD PTR [ecx-0x4]

0x08048415 <+10>: 55 push ebp

0x08048416 <+11>: 89 e5 mov ebp,esp

0x08048418 <+13>: 51 push ecx

0x08048419 <+14>: 83 ec 04 sub esp,0x4

0x0804841c <+17>: 83 ec 0c sub esp,0xc

0x0804841f <+20>: 68 c0 84 04 08 push 0x80484c0

0x08048424 <+25>: e8 b7 fe ff ff call 0x80482e0 <puts@plt>

0x08048429 <+30>: 83 c4 10 add esp,0x10

0x0804842c <+33>: b8 00 00 00 00 mov eax,0x0

0x08048431 <+38>: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]

0x08048434 <+41>: c9 leave

0x08048435 <+42>: 8d 61 fc lea esp,[ecx-0x4]

0x08048438 <+45>: c3 ret

Example 0.48. A function in a specific file can also be specified:

(gdb) disassemble /sr 'hello.c'::main

0x0804840b <+0>: 8d 4c 24 04 lea ecx,[esp+0x4]

0x0804840f <+4>: 83 e4 f0 and esp,0xfffffff0

0x08048412 <+7>: ff 71 fc push DWORD PTR [ecx-0x4]

0x08048415 <+10>: 55 push ebp

0x08048416 <+11>: 89 e5 mov ebp,esp

0x08048418 <+13>: 51 push ecx

0x08048419 <+14>: 83 ec 04 sub esp,0x4

0x0804841c <+17>: 83 ec 0c sub esp,0xc

0x0804841f <+20>: 68 c0 84 04 08 push 0x80484c0

0x08048424 <+25>: e8 b7 fe ff ff call 0x80482e0 <puts@plt>

0x08048429 <+30>: 83 c4 10 add esp,0x10

0x0804842c <+33>: b8 00 00 00 00 mov eax,0x0

0x08048431 <+38>: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]

0x08048434 <+41>: c9 leave

0x08048435 <+42>: 8d 61 fc lea esp,[ecx-0x4]

0x08048438 <+45>: c3 ret

The filename must be included in a single quote, and the function must be prefixed by double colons e.g. 'hello.c'::main to specify disassembling of the function main in the file hello.c.

Command: x

This command examines the content of a given memory range.

Example 0.49. We can examine the raw content in main:

(gdb) x main

0x804840b <main>: 0x04244c8d

By default, without any argument, the command only prints the content of a single memory address. In this case, that is the starting memory address in main.

Example 0.50. With format arguments, the command can print a range of memory in a specific format.

(gdb) x/20b main

0x804840b <main>: 0x8d 0x4c 0x24 0x04 0x83 0xe40xf0 0xff

0x8048413 <main+8>: 0x71 0xfc 0x55 0x89 0xe5 0x510x83 0xec

0x804841b <main+16>: 0x04 0x83 0xec 0x0c

/20b main argument means that the command prints 20 bytes, where main starts in memory.

The general form for format argument is: /<repeated count><format letter>

If the repeated count is not supplied, by default gdb supplies the count as 1. The format letter is one the following value:

Letter	Description
o	Print the memory content in octal format.
x	Print the memory content in hex format.
d	Print the memory content in decimal format.
u	Print the memory content in unsigned decimal format.
t	Print the memory content in binary format.
f	Print the memory content in float format.
a	Print the memory content as memory addresses.
i	Print the memory content as a series of assembly instructions, similar to disassemble command.
c	Print the memory content as an array of ASCII characters.
s	Print the memory content as a string

Depends on the circumstance, certain format is advantageous than the others. For example, if a memory region contains floating-point numbers, then it is better to use the format f than viewing the number as separated 1-byte hex numbers.

Command: print/p

Examining raw memory is useful but usually it is better to have a more human-readable output. This command does precisely the task: it pretty-prints an expression. An expression can be a global variable, a local variable in current stack frame, a function, a register, a number, etc.

Runtime inspection of a program

The main use of a debugger is to examine the state of a program, when it is running. gdb provides a set of useful commands for retrieving useful runtime information.

Command: run

This command starts running the program.

Example 0.51. Run the hello program:

[Inferior 1 (process 1002) exited normally]

Hello World!

The program runs successfully and printed the message “Hello World”. However, it would not be useful if all gdb can do is run a program.

Command: break/b

This command sets a breakpoint at a location in the high-level source code. When gdb runs to a specific location marked by a breakpoint, it stops executing for a programmer to inspect the current state of a program.

Example 0.52. A breakpoint can be set on a line as displayed by an editor. Suppose we want to set a breakpoint at line 3 of the program, which is the start of main function:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello World!\n");
    return 0;
}

When running a program, instead of running from start to finish, gdb stopped at line 3:

(gdb) b 3

Breakpoint 1 at 0x400535: file hello.c, line 3.

Breakpoint 1, main (argc=1, argv=0x7fffffffdfb8) at hello.c:5

The breakpoint is at line 3, but gdb stopped line 5. The reason is that line 3 does not contain code, but a function signature; gdb only stops where it can execute code. The code in the function starts at line 5, the call to printf, so gdb stops there.

Example 0.53. Line of code is not always the reliable way to specify a breakpoint, as the source code can be changed. What if gdb should always stop at main function? In this case, a better method is to use the function name directly:

b main

Then, regardless of how the source code changes, gdb always stops at the main function.

Example 0.54. Sometimes, the debugging program does not contain debug info, or gdb is debugging assembly code. In that case, a memory address can be specified as a stop point. To get the function address, print command can be used:

(gdb) print main

$3 = {int (int, char **)} 0x400526 <main>

Knowing the address of main, we can easily set a breakpoint with a memory address:

b *0x400526

Example 0.55. gdb can also set breakpoint in any source file. Suppose that hello program is composed not just one file but many files e.g. hello1.c, hello2.c, hello3.c... In that case, simply add the filename before either a line number:

b hello.c:3

Example 0.56. A function name in a specific file can also be set:

b hello.c:main

Command: next/n

This command executes the current line and stops at the next line. When the current line is a function call, steps over it.

Example 0.57. After setting a breakpoint at main, run a program and stop at the first printf:

Breakpoint 1, main (argc=1, argv=0x7fffffffdfb8) at hello.c:5

Then, to proceed to the next statement, we use the next command:

(gdb) n

Hello World!

In the output, the first line shows the output produced after executing line 5; then, the next line shows where gdb stops currently, which is line 6.

Command: step/s

This command executes the current line and stops at the next line. When the current line is a function call, steps into it to the first next line in the called function.

Example 0.58. Suppose we have a new function add

Why should we add a new function and function call instead of using the existing printf call? Stepping into shared library functions is tricky because to make debugging works, the debug info must be installed and loaded. It is not worth the trouble for demonstrating this simple command.

#include <stdio.h>

int add(int a, int b) {
	return a + b;
}

int main(int argc, char *argv[])
{
	add(1, 2);
    printf("Hello World!\n");
    return 0;
}

If step command is used instead of next on the function call printf, gdb steps inside the function:

Breakpoint 1, main (argc=1, argv=0xffffd154) at hello.c:11

11 add(1, 2);

(gdb) s

add (a=1, b=2) at hello.c:6

6 return a + b;

After executing the command s, gdb stepped into the add function where the first statement is a return.

Command: ni

At the core, gdb operates on assembly instruction. Source line by line debugging is simply an enhancement to make it friendlier for programmers. Each statement in C translates to one or more assembly instruction, as shown with objdump and disassemble command. With the debug info available, gdb knows how many instructions belong to one line of high-level code; line by line debugging is just a execution of assembly instructions of a line when moving from the current line to the next.

This command executes the one assembly instruction belongs to the current line. Until all assembly instructions of the current line are executed, gdb will not move to the next line. If the current instruction is a call, step over it to the next instruction.

Example 0.59. When breakpoint is on the printf call and ni is used, it steps through each assembly instruction:

(gdb) disassemble /s main

=> 0x0804842c <+33>: mov eax,0x0

Breakpoint 1, main (argc=1, argv=0xffffd154) at hello.c:5

0x0804841f 5 printf("Hello World!\n");

(gdb) ni

(gdb) ni

0x08048424 5 printf("Hello World!\n");

(gdb) ni

Hello World!

0x08048429 5 printf("Hello World!\n");

(gdb)

Upon entering ni, gdb executes current instruction and display the next instruction. That's why from the output, gdb only displays 3 addresses: 0x0804841f, 0x08048424 and 0x08048429. The instruction at 0x0804841c, which is the first instruction of printf, is not displayed because it is the first instruction that gdb stopped at. Assume that gdb stopped at the first instruction of printf at 0x0804841c, the current instruction can be displayed using x command:

(gdb) x/i $eip

=> 0x804841c <main+17>: sub esp,0xc

Command: si

Example 0.60. Recall that the assembly code generated from printf contains a call instruction:

(gdb) disassemble /s main

=> 0x0804842c <+33>: mov eax,0x0

We try instruction by instruction stepping again, but this time by running si at 0x08048424, where call resides:

(gdb) si

0x0804841f 5 printf("Hello World!\n");

(gdb) si

0x08048424 5 printf("Hello World!\n");

(gdb) x/i $eip

=> 0x8048424 <main+25>: call 0x80482e0 <puts@plt>

(gdb) si

0x080482e0 in puts@plt ()

The next instruction right after 0x8048424 is the first instruction at 0x080482e0 in puts function. In other words, gdb stepped into puts instead of stepping over it.

Command: until

This command executes until the next line is greater than the current line.

Example 0.61. Suppose we have a function that execute a long loop:

#include <stdio.h>

int add1000() {
    int total = 0;

    for (int i = 0; i < 1000; ++i){
        total += i;
    }

    printf("Done adding!\n");

    return total;
}

int main(int argc, char *argv[])
{
    add1000(1, 2);
    printf("Hello World!\n");
    return 0;
}

Using next command, we need to press 1000 times for finishing the loop. Instead, a faster way is to use until:

(gdb) b add1000

Breakpoint 1 at 0x8048411: file hello.c, line 4.

Breakpoint 1, add1000 () at hello.c:4

4 int total = 0;

5 for (int i = 0; i < 1000; ++i){

6 total += i;

5 for (int i = 0; i < 1000; ++i){

8 printf("Done adding!\n");

Executing the first until, gdb stopped at line 5 since line 5 is greater than line 4.

Executing the second until, gdb stopped at line 6 since line 6 is greater than line 5.

Executing the third until, gdb stopped at line 5 since the loop still continues. Because line 5 is less than line 6, with the fourth until, gdb kept executing until it does not go back to line 5 anymore and stopped at line 8. This is a great way to skip over loop in the middle, instead of setting unneeded breakpoint.

Example 0.62. until can be supplied with an argument to explicitly execute to a specific line:

Breakpoint 1, add1000 () at hello.c:4

4 int total = 0;

(gdb) until 8

add1000 () at hello.c:8

8 printf("Done adding!\n");

Command: finish

This command executes until the end of a function and displays the return value. finish is actually just a more convenient version of until.

Example 0.63. Using the add1000 function from the previous example and use finish instead of until:

Breakpoint 1, add1000 () at hello.c:4

4 int total = 0;

(gdb) finish

Run till exit from #0 add1000 () at hello.c:4

Done adding!

0x08048466 in main (argc=1, argv=0xffffd154) at hello.c:15

15 add1000(1, 2);

Value returned is $1 = 499500

Command: bt

This command prints the backtrace of all stack frames. A backtrace

backtrace is a list of currently active functions:

Example 0.64. Suppose we have a chain of function calls:

void d(int d) { };
void c(int c) { d(0); }
void b(int b) { c(1); }
void a(int a) { b(2); }

int main(int argc, char *argv[])
{
    a(3);
    return 0;
}

bt can visualize such a chain in action:

(gdb) b a

Breakpoint 1 at 0x8048404: file hello.c, line 9.

Breakpoint 1, a (a=3) at hello.c:9

9 void a(int a) { b(2); }

(gdb) s

b (b=2) at hello.c:7

7 void b(int b) { c(1); }

(gdb) s

c (c=1) at hello.c:5

5 void c(int c) { d(0); }

#0 d (d=0) at hello.c:3

#1 0x080483eb in c (c=1) at hello.c:5

#2 0x080483fb in b (b=2) at hello.c:7

#3 0x0804840b in a (a=3) at hello.c:9

#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13

Most-recent calls are placed on top and least-recent calls are near the bottom. In this case, d is the most current active function, so it has the index 0. Next is c, the 2^nd active function, has the index 1 and so on with function b, function a, and finally function main at the bottom, the least-recent function. That is how we read a backtrace.

Command: up

This command goes up one frame earlier the current frame.

Example 0.65. Instead of staying in d function, we can go up to c function and look at its state:

(gdb) bt

#0 d (d=0) at hello.c:3

#1 0x080483eb in c (c=1) at hello.c:5

#2 0x080483fb in b (b=2) at hello.c:7

#3 0x0804840b in a (a=3) at hello.c:9

#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13

(gdb) up

#1 0x080483eb in c (c=1) at hello.c:3

3 void b(int b) { c(1); }

The output displays the current frame is moved to c and where the call to c is made, which is in function b at line 3.

Command: down

Example 0.66. After inspecting c function, we can go back to d:

(gdb) bt

#0 d (d=0) at hello.c:3

#1 0x080483eb in c (c=1) at hello.c:5

#2 0x080483fb in b (b=2) at hello.c:7

#3 0x0804840b in a (a=3) at hello.c:9

#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13

(gdb) up

#1 0x080483eb in c (c=1) at hello.c:3

3 void b(int b) { c(1); }

(gdb) down

#0 d (d=0) at hello.c:1

1 void d(int d) { };

Command: info registers

This command lists the current values in commonly used registers. This command is useful when debugging assembly and operating system code, as we can inspect the current state of the machine.

Example 0.67. Executing the command, we can see the commonly used registers:

(gdb) info registers

eax 0xf7faddbc -134554180

ecx 0xffffd0c0 -12096

edx 0xffffd0e4 -12060

ebx 0x0 0

esp 0xffffd0a0 0xffffd0a0

ebp 0xffffd0a8 0xffffd0a8

esi 0xf7fac000 -134561792

edi 0xf7fac000 -134561792

eip 0x804841c 0x804841c <main+17>

eflags 0x286 [ PF SF IF ]

The above registers suffice for writing our operating system in later part.

How debuggers work: A brief introduction

How breakpoints work

When a programmer places a breakpoint somewhere in his code, what actually happens is that the first opcode of the first instruction of a statement is replaced with another instruction, int 3 with opcode CCh:

Figure 0.17: Opcode replacement, with int 3

83	ec	0c	$\to$	cc	ec	0c
sub esp,0x4				int 3

int 3 only costs a single byte, making it efficient for debugging. When int 3 instruction is executed, the operating system calls its breakpoint interrupt handler. The handler then checks what process reaches a breakpoint, pauses it and notifies the debugger it has paused a debugged process. The debugged process is only paused and that means a debugger is free to inspect its internal state, like a surgeon operates on an anesthetic patient. Then, the debugger replaces the int 3 opcode with the original opcode and executes the original instruction normally.

Figure 0.18: Restore the original opcode, after int 3 was executed

cc	ec	0c	$\to$	83	ec	0c
int 3				sub esp,0x4

Example 0.68. It is simple to see int 3 in action. First, we add an int 3 instruction where we need gdb to stop:

#include <stdio.h>

int main(int argc, char *argv[])
{
    asm("int 3");
    printf("Hello World\n");
    return 0;
}

int 3 precedes printf, so gdb is expected to stop at printf. Next, we compile with debug enable and with Intel syntax:

$ gcc -masm=intel -m32 -g hello.c -o hello

Finally, start gdb:

$ gdb hello

Running without setting any breakpoint, gdb stops at printf call, as expected:

Program received signal SIGTRAP, Trace/breakpoint trap.

main (argc=1, argv=0xffffd154) at hello.c:6

6 printf("Hello World\n");

The blue text indicates that gdb encountered a breakpoint, and indeed it stopped at the right place: the printf call, where int 3 preceded it.

Single stepping

When breakpoint is implemented, it is easy to implement single stepping: a debugger simply places another int 3 opcode in the next instruction. So, when a programmer sets a breakpoint at an instruction, the next instruction is automatically set by the debugger, thus enable instruction by instruction debugging. Similarly, source line by line debugging is just the placements of the very first opcodes in the two statements with two int 3 opcodes.

How a debugger understands high level source code

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. DWARF contains information that maps between entities in the executable binary with the source files. A program entity can either be data or code. A DIE, or Debugging Information Entry

Debugging Information Entry , is a description of a program entity. A DIE consists of a tag, which specifies the entity that the DIE describes, and a list of attributes that describes the entity. Of all the attributes, these two attributes enables source-level debugging:

Where the entity appears in the source files: which file and which line the entity appears.
Where the entity appears in the executable binary: in which memory address the entity is loaded at runtime. With the precise address, gdb can retrieve correct value for a data entity, or place a correct breakpoint and stop accordingly for a code entity. Without the information of these addresses, gdb would not know where the entities are to inspect them.

hello.c			DIE
Line 1 Line 2 $\Rightarrow$ Line 3 Line 5 Line 6	#include <stdio.h> int main(int argc, char *argv[]) .......... ..........	$\to$	.... .... main in hello.c is at 0x804840b in hello .... ....

			$↓$ $↑$

			hello (at 0x804840b)
			...8d 4c 24 04 83 e4 f0 ff 71 fc ....

In addition to DIEs, another binary-to-source mapping is the line number table. The line number table maps between a line in the source code and at which memory address is the start of the line in the executable binary.

In sum, to successfully enable source-level debugging, a debugger needs to know the precise location of the source files and the load addresses at runtime. Address matching, between the image layout of the ELF binary and the address where it is loaded, is extremely important since debug information relies on correct loading address at runtime. That is, it assumes the addresses as recorded in the binary image at compile-time the same as at runtime e.g. if the load address for .text section is recorded in the executable binary at 0x800000, then when the binary actually runs, .text should really be loaded at 0x800000 for gdb to be able to correctly match running instructions with high-level code statement. Address mismatching makes debug information useless, as actual code at one address is displayed as code at another address. Without this knowledge, we will not be able to build an operating system that can be debugged with gdb.

Example 0.69. When an executable binary contains debug info, readelf can display such information in a readable format. Using the good old hello world program:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello World\n");

    return 0;
}

and compile with debug info:

$ gcc -m32 -g hello.c -o hello

With the binary ready, we can look at the line number table with the command:

$ readlelf -wL hello

-w option prints all the debug information. In combination with its sub-option, only specific information is displayed. For example, with -L, only the line number table is displayed:

Decoded dump of debug contents of section .debug_line:

CU: hello.c:

File name Line number Starting address

From the above output:

CU: shorts for Compilation Unit, a separately compiled source file. In the example, we only have one file, hello.c.
File name: displays the filename of the current compilation unit.
Line number: is the line number in the source file of which the line is not an empty line. In the example, line 8 is an empty line, so it does not appear.
Starting address: is the memory address where the line actually starts in the executable binary.

With such crystal clear information, this is how gdb is able to set a breakpoint on a line easily. For placing breakpoints on variables and functions, it is time to look at the DIEs. To get the DIEs information from an executable binary, run the command:

$ readlelf -wi hello

-wi option lists all the DIE entries. This is one typical DIE entry:

<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)

<c> DW_AT_producer : (indirect string, offset: 0xe): GNU C11 5.4.0 20160609 -masm=intel -m32 -mtune=generic -march=i686 -g -fstack-protector-strong

<10> DW_AT_language : 12 (ANSI C99)

<11> DW_AT_name : (indirect string, offset: 0xbe): hello.c

<15> DW_AT_comp_dir : (indirect string, offset: 0x97): /tmp

<19> DW_AT_low_pc : 0x804840b

<1d> DW_AT_high_pc : 0x2e

<21> DW_AT_stmt_list : 0x0

Red

This left-most number indicates the current nesting level of a DIE entry. 0 is the outer-most level DIE with its entity is the compilation unit. This means subsequent DIE entries with higher nesting level are all the children of this tag, the compilation unit. It makes sense, as all the entities must originate from a source file.

Blue

These numbers in hex format indicate the offsets into .debug_info section. Each meaningful information is displayed along with its offset. When an attribute references to another attribute, the offset is used to precisely identify the referenced attribute.

Green

These names with DW_AT_ prefix are the attributes attached to a DIE that describe an entity. Notable attributes:

DW_AT_name
DW_AT_comp_dir: The filename of the compilation unit and the directory where compilation occurred. Without the filename and the path, gdb would not be able to display the high-level source, despite the availability of the debug info. Debug info only contains the mapping between source and binary, not the source code itself.
DW_AT_low_pc
DW_AT_high_pc: The start and end of the current entity, which is the compilation unit, in the executable binary. The value in DW_AT_low_pc is the starting address. DW_AT_high_pc is the size of the compilation unit, when adding up to DW_AT_low_pc results in the end address of the entity. In this example, code compiled from hello.c starts at 0x804840b and end at $0 x 804840 b + 0 x 2 e = 0 x 8048439$ . To really make sure, we verify with objdump:

int main(int argc, char *argv[])

{

804840b: 8d 4c 24 04 lea ecx,[esp+0x4]

804840f: 83 e4 f0 and esp,0xfffffff0

8048412: ff 71 fc push DWORD PTR [ecx-0x4]

8048415: 55 push ebp

8048416: 89 e5 mov ebp,esp

8048418: 51 push ecx

8048419: 83 ec 04 sub esp,0x4

printf("Hello World\n");

804841c: 83 ec 0c sub esp,0xc

804841f: 68 c0 84 04 08 push 0x80484c0

8048424: e8 b7 fe ff ff call 80482e0 <puts@plt>

8048429: 83 c4 10 add esp,0x10

return 0;

804842c: b8 00 00 00 00 mov eax,0x0

}

8048431: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]

8048434: c9 leave

8048435: 8d 61 fc lea esp,[ecx-0x4]

8048438: c3 ret

8048439: 66 90 xchg ax,ax

804843b: 66 90 xchg ax,ax

804843d: 66 90 xchg ax,ax

804843f: 90 nop

It is true: main starts at 804840b and end at 8048439, right after the ret instruction at 8048438. The instructions after 8048439 are just padding bytes inserted by gcc for alignment, which do not belong to main. Note that the output from objdump shows much more code past main. It is not counted, as the code is outside of hello.c, added by gcc for the operating system. hello.c contains only one function: main and this is why hello.c also starts and ends the same as main.

Pink

This number displays the abbreviation form of a tag. An abbreviation is the form of a DIE. When debug info is displayed with -wi, the DIEs are displayed with their values. -wa option shows abbreviations in the .debug_abbrev section:

Contents of the .debug_abbrev section:

Number TAG (0x0)

1 DW_TAG_compile_unit [has children]

DW_AT_producer DW_FORM_strp

DW_AT_language DW_FORM_data1

DW_AT_name DW_FORM_strp

DW_AT_comp_dir DW_FORM_strp

DW_AT_low_pc DW_FORM_addr

DW_AT_high_pc DW_FORM_data4

DW_AT_stmt_list DW_FORM_sec_offset

DW_AT value: 0 DW_FORM value: 0

.... more abbreviations ....

The output is similar to a DIE output, with only attribute names and without any value. We can also say an abbreviation is a type of a DIE, as an abbreviation represents the structure of a particular DIE. Many DIEs share the same abbreviation, or structure, thus they are of the same type. An abbreviation number specifies which type a DIE is in the abbreviation table above. Abbreviations improve encoding efficiency (reduce binary size) because each DIE needs not to carry their structure information as pairs of attribute-value

For example, data format such as YAML or JSON encodes its attribute names along with its values. This simplifies encoding, but with overhead.

, but simply refers to an abbreviation for correct decoding.

Here are all the DIEs of hello represented as a tree:

In the figure 6, DW_TAG_subprogram represents a function such as main. Its children are the DIEs of argc and argv. With such precise information, matching source to binary is an easy job for gdb.

If more than one compilation units exist in an executable binary, the DIE entries are sorted according to the compilation order from gcc. For example, suppose we have another test.c source file

It can contain anything. Just a sample file.

and compile it together with hello:

$ gcc -masm=intel -m32 -g test.c hello.c -o hello

Then, the all DIE entries in test.c are displayed before the DIE entries in hello.c:

<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)

<c> DW_AT_producer : (indirect string, offset: 0x0): GNU C11 5.4.0 20160609

-masm=intel -m32 -mtune=generic -march=i686 -g -fstack-protector-strong

<10> DW_AT_language : 12 (ANSI C99)

<11> DW_AT_name : (indirect string, offset: 0x64): test.c

<15> DW_AT_comp_dir : (indirect string, offset: 0x5f): /tmp

<19> DW_AT_low_pc : 0x804840b

<1d> DW_AT_high_pc : 0x6

<21> DW_AT_stmt_list : 0x0

<1><25>: Abbrev Number: 2 (DW_TAG_subprogram)

<26> DW_AT_external : 1

<26> DW_AT_name : bar

<2a> DW_AT_decl_file : 1

<2b> DW_AT_decl_line : 1

<2c> DW_AT_low_pc : 0x804840b

<30> DW_AT_high_pc : 0x6

<34> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)

<36> DW_AT_GNU_all_call_sites: 1

....after all DIEs in test.c listed....

<0><42>: Abbrev Number: 1 (DW_TAG_compile_unit)

<43> DW_AT_producer : (indirect string, offset: 0x0): GNU C11 5.4.0 20160609

-masm=intel -m32 -mtune=generic -march=i686 -g -fstack-protector-strong

<47> DW_AT_language : 12 (ANSI C99)

<48> DW_AT_name : (indirect string, offset: 0xc5): hello.c

<4c> DW_AT_comp_dir : (indirect string, offset: 0x5f): /tmp

<50> DW_AT_low_pc : 0x8048411

<54> DW_AT_high_pc : 0x2e

<58> DW_AT_stmt_list : 0x35

....then all DIEs in hello.c are listed....

Part II Groundwork

Bootloader

A bootloader loads an OS, or an application

Many embedded devices don't use an OS. In embedded systems, the bootloader is simply included in boot firmware and no bootloader is needed.

that runs and communicate directly with hardware. To run an OS, the first thing to write is a bootloader. In this chapter, we are going to write a rudimentary bootloader, as our main focus is writing an operating system, not a bootloader. More interestingly, this chapter will present related tools and techniques that are applicable for writing a bootloader as well as an operating system.

x86 Boot Process

After the POST process finished, the CPU's program counter is set to the address FFFF:0000h for executing BIOS code. BIOS - Basic Input/Output System is a firmware that performs hardware initialization and provides a set of generic subroutines to control input/output devices. The BIOS checks all available storage devices (floppy disks and hard disks) if any device is bootable, by examining the last two bytes of the first sector whether it has the boot record signature of 0x55, 0xAA. If so, the BIOS loads the first sector to the address 7C00h, set the program counter to that address and let the CPU executing code from there.

The first sector is called Master Boot Record, or MBR. The program in the first sector is called MBR Bootloader.

Using BIOS services

BIOS provides many basic services for controlling the hardware at the boot stage. A service is a group of routines that controls a particular hardware device, or returns information of current system. Each service is given an interrupt number. To call a BIOS routine, an int instruction must be used with an interrupt number. Each BIOS service defines its own numbers for its routines; to call a routine, a specific number must be written to a register required by each service. The list of all BIOS interrupts is available with Ralf Brown's Interrupt List at: http://www.cs.cmu.edu/~ralf/files.html.

Figure 0.19: The boot process.

image: 28_img_07_simple_boot_process.png

Example:

Interrupt call 13h (diskette service) requires number of sectors to read, track number, sector number, head number and drive number to read from a storage device. The content of the sector is stored in memory at the address defined by the pair of registers ES:BX. The parameters are stored in registers like this:

; Store sector content in the buffer 10FF:0000
mov     dx, 10FFh
mov     es, dx
xor     bx, bx
mov     al, 2    ; read 2 sector
mov     ch, 0    ; read track 0
mov     cl, 2    ; 2nd sector is read
mov     dh, 0    ; head number
mov     dl, 0    ; drive number. Drive 0 is floppy drive.
mov     ah, 0x02 ; read floppy sector function
int     0x13     ; call BIOS - Read the sector

The BIOS is only available in real mode. However, when switching to protected mode, then BIOS will not be usable anymore and the operating system code is responsible for controlling hardware devices. This is when the operating system stands on its own: it must provide its own kernel drivers for talking to hardware.

Boot process

BIOS transfers control to MBR bootloader by jumping to 0000:7c00h, where bootloader is assumed to exist already.
Setup machine environment for booting by properly initialize segment registers to enable flat memory model.
Load the kernel:
1. Read kernel from disk.
2. Save it somewhere in the main memory.
3. Jump to the starting code address of the kernel and execute.
If error occurs, print a message to notify users something went wrong and halt.

Example Bootloader

Here is a simple bootloader that does nothing, except not crashing the machine but halt it gracefully. If the virtual machine does not halt but text repeatedly flashing, it means the bootloader does not load properly and the machine crashed. The machine crashed because it keeps executing until the near end of physical memory (1 MB in real mode), which is FFFF:0000h, which starts the whole BIOS boot process all over again. This is effectively a reset, but not fully, since machine environment from previous run is still reserved. For that reason, it is called a warm reboot. The opposite of warm reboot is cold reboot, in which the machine environment is reset to initial settings when the computer starts from a powerless state.

;******************************************
; bootloader.asm		
; A Simple Bootloader
;******************************************
org 0x7c00
bits 16
start: jmp boot

;; constant and variable definitions
msg	db	"Welcome to My Operating System!", 0ah, 0dh, 0h

boot:
  cli	; no interrupts 	
  cld	; all that we need to init
  hlt	; halt the system

; We have to be 512 bytes. Clear the rest of the bytes with 0
times 510 - ($-$$) db 0
dw 0xAA55				  ; Boot Signiture

Compile and load

We compile the code with nasm and write it to a disk image:

$ nasm -f bin bootloader.asm -o bootloader

Then, we create a 1.4 MB floppy disk and:

$ dd if=/dev/zero of=disk.img bs=512 count=2880

2880+0 records in

2880+0 records out

1474560 bytes (1.5 MB, 1.4 MiB) copied, 0.00625622 s, 236 MB/s

Then, we write the bootloader to the 1^stsector:

$ dd conv=notrunc if=bootloader of=disk.img bs=512 count=1 seek=0

1+0 records in

1+0 records out

512 bytes copied, 0.000102708 s, 5.0 MB/s

The option conv=notrunc preserves the original size of the floppy disk. Without this option, the 1.4 MB disk image will be completely replaced by the new disk.img with only 512 bytes, and we do not want that happens.

In the past, developing an operating system is complicated because a programmer needs to understand specific hardware he is using. Even though x86 was ubiquitous, the minute differences between models made some code written for a machine not run on another. Further, if you use the same physical computer you write your operating system take very long between runs, and also difficult to debug. Fortunately, today we can uniformly produce a virtual machine with a particular specification and avoid the incompatibility issue altogether, thus making an OS easier to write and test since everyone can reproduce the same machine environment.

We will be using QEMU, a generic and open source machine emulator and virtualizer. QEMU can emulate various types of machine, not limited to x86_64 only. Debug is easy since you can connect GDB to a virtual machine to debug code that runs on it, through QEMU's built-in GDB server. QEMU can use disk.img as a boot device e.g. a floppy disk:

$ qemu-system-i386 -machine q35 -fda disk.img -gdb tcp::26000 -S

With option -machine q35, QEMU emulates a q35 machine model from Intel.
29
The following command lists all supported emulated machines from QEMU:

qemu-system-i386 -machine help

.
With option -fda disk.img, QEMU uses disk.img as a floppy disk image.
With option -gdb tcp::26000, QEMU allows gdb to connect to the virtual machine for remote debugging through a tcp socket with port 26000.
With option -S, QEMU waits for gdb to connect before it starts running.

After the command is executed, a new console window that displays the screen output of the virtual machine. Open another terminal, run gdb and set the current architecture to i386, since we are running in 16-bit mode:

(gdb) set architecture i8086

warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration

of GDB. Attempting to continue with the default i8086 settings.

The target architecture is assumed to be i8086

Then, connect gdb to the waiting virtual machine with this command:

(gdb) target remote localhost:26000

Remote debugging using localhost:26000

0x0000fff0 in ?? ()

Then, place a breakpoint at 0x7c00:

(gdb) b *0x7c00

Breakpoint 1 at 0x7c00

Note the before the memory address. Without the asterisk, gdb treats the address as a symbol in a program rather than an address. Then, for convenience, we use a split layout for viewing the assembly code and registers together:

(gdb) layout asm

(gdb) layout reg

Finally, run the program:

(gdb) c

If the virtual machine successfully runs the bootloader, this is what the QEMU screen should look like:

Figure 0.20: Boot succeeded.

Debugging

If, for some reason, the sample bootloader cannot get to such screen and gdb does not stop at 0x7c00, then the following scenarios are likely:

The bootloader is invalid: the message “Boot failed: not a bootable disk” appears for floppy disk booting. Make sure the boot signature is at the last 2 bytes of the 512-byte first sector.
The machine cannot find a boot disk: the message “Boot failed: not a bootable disk” appears for floppy disk booting. Make sure the bootloader is correctly written to the first sector. It can be verify by check the disk with hd:

$ hd disk.img | less

If the first 512 bytes are all zeroes, then it is likely that the bootloader is incorrectly written to another sector.
The machine crashes: When such scenario happens, it reset back to the beginning at FFFF:0000h. If the QEMU machine starts without waiting for gdb, then the console output window keeps flashing as the machine is repeatedly reset. It is likely some instruction in the bootloader code causing the fault.

Exercise 0.16. Print a welcome message

We loaded the bootloader successfully. But, it needs to do something useful other than halting our machine. The easiest thing to do is printing something on screen, like how an introduction to all programming language starts with “Hello World”. Our bootloader prints “Welcome to my operating system”

Or whatever message you want.

. In this part, we will build a simple I/O library that allows us to set a cursor anywhere on the screen and print text there.

First, create a file io.asm for I/O related routines. Then, write the following routines:

MovCursor

Purpose: Move a cursor to a specific location on screen and remember this location.

Parameters:
- bh = Y coordinate
- bl = X coordinate.
Return: None
PutChar

Purpose: Print a character on screen, at the cursor position previously set by MovCursor .

Parameters:
- al = Character to print
- bl = text color
- cx = number of times the character is repeated
Return:

None
Print

Purpose: Print a string.

Parameters:
- ds:si = Zero terminated string
Return:

None

Test the routines by putting each in the bootloader source, compile and run. To debug, run GDB and set a breakpoint at a specific routine. The end result is that Print should display a welcome message on screen.

Loading a program from bootloader

Now that we get the feel of how to use the BIOS services, it is time for something more complicated. We will place our kernel on 2^nd sector onward, and our bootloader reads 30 sectors starting from 2^nd sector. Why 30 sectors? Our kernel will grow gradually, so we will preserve 30 sectors and save us time for modifying the bootloader each time the kernel size expands another sector.

The primary responsibility of a bootloader is to read an operating system from some storage device e.g. hard disk, then loads it into main memory and transfer the control to the loaded operating system, similar to how the BIOS reads and loads a bootloader. At the moment, our bootloader does nothing more than just an assembly program loaded by the BIOS. To make our bootloader a real one, it must perform well the above two tasks: read and load an operating system.

Floppy Disk Anatomy

To read from a storage device, we must understand how the device works, and the provided interface for controlling it. First of all, a floppy disk is a storage device, similar to RAM, but can store information even when a computer is turned off, thus is called

persistent storage device

persistent storage device. A floppy disk also a persistent storage device, thus it provides a storage space up to 1.4 MB, or 1,474,560 bytes. When reading from a floppy disk, the smallest unit that can be read is a sector, a group of 512 contiguous bytes. A group of 18 sectors is a track. Each side of a floppy disk consists of 80 tracks. A floppy drive is required to read a floppy disk. Inside a floppy drive contains an arm with 2 heads, each head reads a side of a floppy drive; head 0 writes the upper side and head 1 writes the lower side of a floppy disk.

MarginFigure 8: Sector and Track.

When a floppy drive writes data to a brand new floppy disk, track 0 on the upper side is written first, by head 0. When the upper track 0 is full, the lower track 0 is used by head 1. When both the upper and lower side of a track 0 are full, it goes back to head 0 for writing data again, but this time the upper side of track 1 and so on, until no space left on the device. The same procedure is also applied for reading data from floppy disk.

MarginFigure 9: Floppy disk platter with 2 sides.

Read and load sectors from a floppy disk

First, we need to a sample program for writing into the 2^nd sector, so we can experiment with floppy disk reading:

;******************************************
; sample.asm		
; A Sample Program
;******************************************
mov eax, 1
add eax, 1

Such a program is good enough. To simplify and for the purpose of demonstration, we will use the same floppy disk that holds the bootloader to hold our operating system. The operating system image starts from the 2^nd sector, as the 1^st sector is already in use by the bootloader. We compile and write it to the 2^nd sector with dd:

$ nasm -f bin sample.asm -o sample

$ dd if=sample of=disk.img bs=512 count=1 seek=1

Figure 0.21: The bootloader and the sample program on floppy disk.

1^st sector	2^nd sector	.....		30^th sector
bootloader	sample	....		(empty)

Next, we need to fix the bootloader for reading from the floppy disk and load a number of arbitrary sectors. Before doing so, a basic understanding of floppy disk is required. To read data from disk, interrupt 13 with AH = 02 is a routine for reading sectors from disk into memory:

AH = 02

AL = number of sectors to read (1-128 dec.)

CH = track/cylinder number (0-1023 dec., see below)

CL = sector number (1-17 dec.)

DH = head number (0-15 dec.)

DL = drive number (0=A:, 1=2nd floppy, 80h=drive 0, 81h=drive 1)

ES:BX = pointer to buffer

Return:

AH = status (see INT 13,STATUS)

AL = number of sectors read

CF = 0 if successful

= 1 if error

Apply the above routine, the bootloader can read the 2^nd sector:

;******************************************
; Bootloader.asm		
; A Simple Bootloader
;******************************************
org 0x7c00
bits 16
start: jmp boot

;; constant and variable definitions
  msg	db	"Welcome to My Operating System!", 0ah, 0dh, 0h

boot:
  cli	; no interrupts 	
  cld	; all that we need to init

  mov	ax, 0x50

  ;; set the buffer
  mov	es, ax
  xor	bx, bx

  mov	al, 2					; read 2 sector
  mov	ch, 0					; track 0
  mov	cl, 2					; sector to read (The second sector)
  mov	dh, 0					; head number
  mov	dl, 0					; drive number

  mov	ah, 0x02			     ; read sectors from disk
  int	0x13					 ; call the BIOS routine
  jmp	0x50:0x0				; jump and execute the sector!

  hlt	; halt the system

  ; We have to be 512 bytes. Clear the rest of the bytes with 0
times 510 - ($-$$) db 0
dw 0xAA55				  ; Boot Signiture

The above code jumps to the address 0x50:00 (which is 0x500). To test the code, load it on a QEMU virtual machine and connect through gdb, then place a breakpoint at 0x500. If gdb stops at the address, with the assembly listing is the same code as in sample.asm, then the bootloader successfully loaded the program. This is an important milestone, as we ensure that our operating system are loaded and ran properly.

Improve productivity with scripts

Automate build with GNU Make

Up to this point, the whole development process felt repetitive: whenever a change is made, the same commands are entered again. The commands are also complex. Ctrl+r helps, but it still feels tedious.

GNU Make is a program that controls and automates the process of building a complex software. For a small program, like a single C source file, invoking gcc is quick and easy. However, soon your software will be more complex, with multiples spanning multiple directories, it is a chore to manually build and link files. To solve such problem, a tool was created to automate away this problem and is called a build system. GNU Make is one such of tools. There are various build systems out there, but GNU Make is the most popular in Linux world, as it is used for building the Linux kernel.

For a comprehensive introduction to make, please refer to the official Introduction to Make: https://www.gnu.org/software/make/manual/html_node/Introduction.html#Introduction. And that's enough for our project. You can also download the manual in different formats e.g. PDF from the official manual page: https://www.gnu.org/software/make/manual/ .

With Makefile, we can build simpler commands and save time:

all: bootloader bootdisk

bootloader:
	nasm -f bin bootloader.asm -o bootloader.o

kernel:
	nasm -f bin sample.asm -o sample.o

bootdisk: bootloader.o kernel.o
	dd if=/dev/zero of=disk.img bs=512 count=2880
	dd conv=notrunc if=bootloader.o of=disk.img bs=512 count=1 seek=0
	dd conv=notrunc if=sample.o of=disk.img bs=512 count=1 seek=1

Now, with a single command, we can build from start to finish a disk image with a bootloader at 1^stsector and the sample program at 2^ndsector:

$ make bootdisk

nasm -f bin bootloader.asm -o bootloader.o

nasm -f bin sample.asm -o bootloader.o

dd if=/dev/zero of=disk.img bs=512 count=2880

2880+0 records in

2880+0 records out

1474560 bytes (1.5 MB, 1.4 MiB) copied, 0.00482188 s, 306 MB/s

dd conv=notrunc if=bootloader.o of=disk.img bs=512 count=1 seek=0

0+1 records in

0+1 records out

10 bytes copied, 7.0316e-05 s, 142 kB/s

dd conv=notrunc if=sample.o of=disk.img bs=512 count=1 seek=1

0+1 records in

0+1 records out

10 bytes copied, 0.000208375 s, 48.0 kB/s

Looking at the Makefile, we can see a few problems:

First, the name disk.img are all over the place. When we want to change the disk image name e.g. floppy_disk.img, all the places with the name disk.img must be changed manually. To solve this problem, we use a variable, and every appearance of disk.img is replaced with the reference to the variable. This way, only one place that is changed - the variable definition - all other places are updated automatically. The following variables are added:

BOOTLOADER=bootloader.o
OS=sample.o
DISK_IMG=disk.img.o

The second problem is, the name bootloader and sample appears as part of the filenames of the source files e.g. bootloader.asm and sample.asm, as well as the filenames of the binary files e.g. bootloader and sample. Similar to disk.img, when a name changed, every reference of that name must also be changed manually for both the names of the source files and the names of the binary files e.g. if we change bootloader.asm to loader.asm, then the object file bootloader.o needs changing to loader.o. To solve this problem, instead of changing filenames manually, we create a rule that automatically generate the filenames of one extension to another. In this case, we want any source file that starts with .asm to have its equivalent binary files, without any extension e.g. bootloader.asm

\to

bootloader.o. Such transformation is common, so GNU Make provides built-in functions: wildcard and patsubst for solving such problems:

BOOTLOADER_SRCS := $(wildcard *.asm)
BOOTLOADER_OBJS := $(patsubst %.asm, %.o, $(BOOTLOADER_SRCS))

wildcard matches any .asm file in the current directory, then assigned the list of matched files into the variable BOOTLOADER_SRCS. In this case, BOOTLOADER_SRCS is assigned the value:

bootloader.asm sample.asm

patsubst substitutes any filename starts with .asm into a filename .o e.g. bootloader.asm

\to

bootloader.o. After patsubsts runs, we get a list of object files in BOOTLOADER_OBJS:

bootloader.o sample.o

Finally, a recipe for building from .asm to .o are needed:

%.o: %.asm
	nasm -f bin $< -o $@

$< is a special variable that refers to the input of the recipe: %.asm.
$@ is a special variable that refers to the output of the recipe: %.o.

When the recipe is executed, the variables are replaced with the actual values. For example, if a transformation is bootloader.asm

\to

bootloader.o, then the actual command executed when replace the placeholders in the recipe is:

nasm -f bin bootloader.asm -o bootloader.o

With the recipe, all the .asm files are built automatically with the nasm command into .o files and we no longer need a separate recipe for each object files. Putting it all together with the new variables, we get a better Makefile:

BOOTLOADER=bootloader.o
OS=sample.o
DISK_IMG=disk.img

BOOTLOADER_SRCS := $(wildcard *.asm)
BOOTLOADER_OBJS := $(patsubst %.asm, %.o, $(BOOTLOADER_SRCS))

all: bootdisk

%.o: %.asm
	nasm -f bin $< -o $@

bootdisk:  $(BOOTLOADER_OBJS)
	dd if=/dev/zero of=$(DISK_IMG) bs=512 count=2880
	dd conv=notrunc if=$(BOOTLOADER) of=$(DISK_IMG) bs=512 count=1 seek=0
	dd conv=notrunc if=$(OS) of=$(DISK_IMG) bs=512 count=1 seek=1

From here on, any .asm file is compiled automatically, without an explicit recipe for each file.

The object files are in the same directory as the source files, making it more difficult when working with the source tree. Ideally, object files and source files should live in different directories. We want a better organized directory layout like Figure Part \Roman{part}.

MarginFigure 10: A better project layout

The layout can be displayed with tree command:

$ tree

bootloader/ directory holds bootloader source files; os/ holds operating system source files that we are going to write later; build/ holds the object files for both the bootloader, the os and the final disk image disk.img. Notice that bootloader/ directory also has its own Makefile. This Makefile will be responsible for building everything in bootloader/ directory, while the top-level Makefile is released from the burden of building the bootloader, but only the disk image. The content of the Makefile in bootloader/ directory should be:

BUILD_DIR=../build/bootloader

BOOTLOADER_SRCS := $(wildcard *.asm)
BOOTLOADER_OBJS := $(patsubst %.asm, $(BUILD_DIR)/%.o, $(BOOTLOADER_SRCS))

all: $(BOOTLOADER_OBJS)

$(BUILD_DIR)/%.o: %.asm
	nasm -f bin $< -o $@

MarginFigure 11: Makefile in bootloader/

Basically everything related to the bootloader in the top-level Makefile are extracted into this Makefile. When make runs this Makefile, bootloader.o should be built and put into ../build/ directory. As a good practice, all references to ../build/ go through BUILD_DIR variable. The recipe for transforming from .asm

\to

.o is also updated with proper paths, else it will not work.

%.asm refers to the assembly source files in the current directory.
$(BUILD_DIR)/%.o refers to the output object files in the build directory in the path ../build/.

The entire recipe implements the transformation from <source_file.asm>

\to

../build/<object_file.o>. Note that all paths must be correct. If we try to build object files in a different directory e.g. current directory, it will not work since there is no such recipe exists to build objects at such a path.

We also create a similar Makefile for os/ directory:

MarginFigure 12: Makefile in os/

BUILD_DIR=../build/os

OS_SRCS := $(wildcard *.asm)
OS_OBJS := $(patsubst %.asm, $(BUILD_DIR)/%.o, $(OS_SRCS))

all: $(OS_OBJS)

$(BUILD_DIR)/%.o: %.asm
	nasm -f bin $< -o $@

For now, it looks almost identical to the Makefile for bootloader. In the next chapter, we will update it for C code. Then, we update the top-level Makefile:

MarginFigure 13: Top-level Makefile

BUILD_DIR=build
BOOTLOADER=$(BUILD_DIR)/bootloader/bootloader.o
OS=$(BUILD_DIR)/os/sample.o
DISK_IMG=disk.img

all: bootdisk

.PHONY: bootdisk bootloader os

bootloader:
	make -C bootloader

os:
	make -C os

bootdisk: bootloader os
	dd if=/dev/zero of=$(DISK_IMG) bs=512 count=2880
	dd conv=notrunc if=$(BOOTLOADER) of=$(DISK_IMG) bs=512 count=1 seek=0
	dd conv=notrunc if=$(OS) of=$(DISK_IMG) bs=512 count=1 seek=1

The build process is now truly modularized:

bootloader and os builds are now delegated to child Makefile of respective components. -C option tells make to execute with a Makefile in a supplied directory. In this case, the directories are bootloader/ and os/.
The target all of the top-level Makefile is only responsible for bootdisk target, which is the primary target of this Makefile.

In many cases, a target is not always a filename, but is just a name for a recipe to be always executed when requested. If a filename is of the same name as a target and the file is up-to-date, make does not execute the target. To solve this problem, .PHONY specifies that some targets are not files. All phony targets will then run when requested, regardless of files of the same names.

To save time entering the command for starting up a QEMU virtual machine, we also add a target to the top-level Makefile:

qemu:
	qemu-system-i386 -machine q35 -fda $(DISK_IMG) -gdb tcp::26000 -S

One last problem is project cleaning. At the moment, object files need removing manually and this is a repetitive process. Instead, let the Makefile of each component takes care of cleaning its object files, then top-level Makefile performs project cleaning by calling the component Makefile to do the jobs. Each Makefile is added with a clean target at the end:

Bootloader Makefile:
```
clean:
	rm $(BUILD_DIR)/*
```
OS Makefile:
```
clean:
	rm $(BUILD_DIR)/*
```

Top-level Makefile:

clean:
	make -C bootloader clean
	make -C os clean

Simply invoking make clean at the project root, all object files the are removed.

GNU Make Syntax summary

GNU Make, at its core, is a domain-specific language for build automation. As any programming language, it needs a way to define data and code. In a Makefile, variables carry data. A variable value is either hard coded or evaluated from invoking a shell such as Bash. All variable values in Make has the same type: a string of text. Number 3 is not a number, but textual representation of the symbol 3. Here are common ways how to define data in a Makefile:

Syntax	Description
`A = 1` `B = 2` `C = $$(expr $(A) + $(B))` ⇒ `A` is `1`, `B` is `2`, `C` is `3`.	Declare a variable and assign a textual value to it. the double dollar sign `$$` means the enclosing expression evaluating by a shell, defined by `/bin/sh`. In this case, the enclosing expression is `(expr $(A) + $(B))` and is evaluated by Bash.
`PATH = /bin` `PATH := $PATH:/usr/bin` ⇒ `PATH` is `/bin/:/usr/bin`	Declare a variable and assign to it. However, the difference is that the `=` syntax does not allow refer to a variable to use itself as a value in the right hand side, while this syntax does.
`PATH = /bin` `PATH += /usr/bin` ⇒ `PATH` is `/bin/:/usr/bin`	Append a new value at the end of a variable. Equivalent to: `PATH := $PATH:/usr/bin`
`CFLAGS ?= -o` ⇒ `CFLAGS` is assigned the value `-o` if it was not defined.	This syntax is called conditional reference. Set a variable to a value if it is undefined. This is useful if a user wants to supply different value for a variable from the command line e.g. add debugging option to `CFLAGS`. Otherwise, Make uses the default defined by `?=`.
`SRCS = lib1.c lib2.c main.c` `OBJS := $(SRC:.o=.c)` ⇒ `OBJS` has the value `lib1.o lib2.o main.o`	This syntax is called substitution reference. A part of referenced variable is replaced with something else. In this case, all the `.c` extension is replaced by `.o` extension, thus creating a list of object files for `OBJS` variable from the list of source files from `SRCS` variable.

Code in GNU Make is a collection of recipes that it can run. Each recipe is analogous to a function in a programming language, and can be called like a regular function. Each recipe carries a series of shell commands to be executed by a shell e.g. Bash. A recipe has the following format:

target: prerequisites
	command

Each target is analogous to a function name. Each prerequisite is a call another target. Each command is one of Make's built-in commands or a command that is executable by a shell. All prerequisites must be satisfied before entering main body of target; that is, each prerequisite must not return any error. If any error is returned, Make terminates the whole build process and prints an error on the command line.

Each time make runs, by default if no target is supplied, it starts with all target, go through every prerequisites and finally the body of all. all is analogous to main in other programming languages. However, if make is given a target, it will start from that target instead of main. This feature is useful to automate multiple aspects in a project. For example, one target is for building the project, one target is for generating the documents e.g. test reports, another target for running the whole test suite and all runs every main targets.

Automate debugging steps with GDB script

For the convenience, we save GDB configuration to .gdbinit file at the project root directory. This configuration is just a collection of GDB commands and a few extra commands. When gdb runs, it first loads the .gdbinit file at home directory, then the .gdbinit file at the current directory. Why shouldn't we put commands in ~/.gdbinit? Because these commands are specific to only this project e.g. not all programs are required a remote connection.

Our first configuration:

define hook-stop
    # Translate the segment:offset into a physical address
    printf "[%4x:%4x] ", $cs, $eip
    x/i $cs*16+$eip
end

The above script displays the memory address in [segment:offset] format, which is necessary for debugging our bootloader and operating system code.

It is better to use Intel syntax:

set disassembly-flavor intel

The following commands set a more convenient layout for debugging assembly code:

layout asm

layout reg

We are currently debugging bootloader code, so it is a good idea to first set it to 16-bit:

set architecture i8086

Every time the QEMU virtual machine starts, gdb must always connect to port 26000. To avoid the trouble of manually connecting to the virtual machine, add the command:

target remote localhost:26000

Debugging the bootloader needs a breakpoint at 0x7c00, where our bootloader code starts:

b *0x7c00

Now, whenever gdb starts, it automatically set correct architecture based on code, automatically connects to the virtual machine

The QEMU virtual machine should have already been started before starting gdb.

, displays output in a convenient layout and set a necessary breakpoint. All that need to do is run the program.

Linking and loading on bare metal

Relocation

Relocation is the process of replacing symbol references with its actual symbolic definitions in an object file. A symbol reference is the memory address of a symbol.

If the definition is hard to understand, consider a similar analogy: house relocation. Suppose that a programmer bought a new house and the new house is empty. He must buy furnitures and appliances to fulfill daily needs and thus, he made a list of items to buy, and where to place them. To visualize the placements of new items, he draws a blueprint of the house and the respective places of all items. He then travels to the shops to buy goods. Whenever he visit a shop and sees matched items, he tells the shop owner to note them down. After done selecting, he tells the shop owner to pick up a brand new item instead of the objects on display, then give the address for delivering the goods to his new house. Finally, when the goods arrive, he places the items where he planned at the beginning.

Now that house relocation is clear, object relocation is similar:

The list of items represents the relocation table, where the memory location for each symbol (item) is predetermined.
Each item represents a pair of symbol definition and its symbol address.
Each shop represents a compiled object file.
Each item on display represents a symbol definition and references in the object file.
The new address, where all the goods are delivered, represents the final executable binary or the final object file. Since the items on display are not for sale, the shop owner delivers brand new goods instead. Similarly, the object files are not merged together, but copied all over a new file, the object/executable file.
Finally, the goods are placed in the positions according to the shopping list made from the beginning. Similarly, the symbol definitions are placed appropriately in its respective section and the symbol references of the final object/executable file are replaced with the actual memory addresses of the symbol definitions.

Understand relocations with readelf

Earlier, when we explore object sections, there exists sections that begins with .rel. These sections are relocation tables that maps between a symbol and its location in the final object file or the final executable binary

A .rel section is equivalent to a list of items in the house analogy.

Suppose that a function foo is defined in another object file, so main.c declares it as extern:

int i;
void foo();
int main(int argc, char *argv[])
{
    i = 5;
    foo();
    return 0;
}

void foo() {}

When we compile main.c as object file with this command:

$ gcc -m32 -masm=intel -c main.c

Then, we can inspect the relocation tables with this command:

$ readelf -r main.o

The output:

Relocation section '.rel.text' at offset 0x1cc contains 2 entries:

Offset Info Type Sym.Value Sym. Name

00000013 00000801 R_386_32 00000004 i

Relocation section '.rel.eh_frame' at offset 0x1dc contains 2 entries:

Offset Info Type Sym.Value Sym. Name

00000020 00000202 R_386_PC32 00000000 .text

0000004c 00000202 R_386_PC32 00000000 .text

Offset

An offset

offset is the location into a section of a binary file, where the actual memory address of a symbol definition is replaced. The section with .rel prefix determines which section to offset into. For example, .rel.text is the relocation table of symbols whose address needs correcting in .text section, at a specific offset into .text section. In the example output:

The blue number indicates there exists a reference of symbol foo that is 1c bytes into .text section. To see it clearer, we recompile main.c with option -g into the file main_debug.o, then run objdump on it and got:

Disassembly of section .text:

00000000 <main>:

int i;

void foo();

int main(int argc, char *argv[])

{

0: 8d 4c 24 04 lea ecx,[esp+0x4]

4: 83 e4 f0 and esp,0xfffffff0

7: ff 71 fc push DWORD PTR [ecx-0x4]

a: 55 push ebp

b: 89 e5 mov ebp,esp

d: 51 push ecx

e: 83 ec 04 sub esp,0x4

i = 5;

11: c7 05 00 00 00 00 05 mov DWORD PTR ds:0x0,0x5

18: 00 00 00

foo();

1b: e8 fc ff ff ff call 1c <main+0x1c>

return 0;

20: b8 00 00 00 00 mov eax,0x0

2a: 8d 61 fc lea esp,[ecx-0x4]

25: 83 c4 04 add esp,0x4

28: 59 pop ecx

29: 5d pop ebp

2d: c3 ret

....irrelevant content omitted....

The byte at 1b is the opcode e8, the call instruction; byte at 1c is the value fc. Why is the operand value for e8 is 0xfffffffc, which is equivalent to -4, but the translated instruction call 1c? It will be explained after a few more sections, but you should pause and think a bit about the reason why.

Info

Info specifies index of a symbol in the symbol table and the type of relocation to perform.

The pink number is the index of symbol foo in the symbol table, and the green number is the relocation type. The numbers are written in hex format. In the example, 0a means 10 in decimal, and symbol foo is indeed at index 10:

10: 0000002e 6 FUNC GLOBAL DEFAULT 1 foo

Type

Type represents the type value in textual form. Looking at the type of foo:

The green number is type in its numeric form, and R_386_PC32 is the name assigned to that value. Each value represents a relocation method of calculation. For example, with the type R_386_PC32, the following formula is applied for relocation (Inteli386 psABI):

R e l o c a t e d O f f s e t = S + A - P

To understand the formula, it is necessary to understand symbol values.

Sym.Value

This field shows the symbol value. A symbol value is a value assigned to a symbol, whose meaning depends on the Ndx field:

A symbol whose section index is COMMON,: its symbol value holds alignment constraints.

Example 0.70. In the symbol table, the variable i is identified as COM (uninitialized variable):
33
The command for listing symbol table is (assume the object file is hello.o):

readelf -s hello.o

Symbol table '.symtab' contains 16 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

1: 00000000 0 FILE LOCAL DEFAULT ABS hello2.c

2: 00000000 0 SECTION LOCAL DEFAULT 1

3: 00000000 0 SECTION LOCAL DEFAULT 3

4: 00000000 0 SECTION LOCAL DEFAULT 4

5: 00000000 0 SECTION LOCAL DEFAULT 5

6: 00000000 0 SECTION LOCAL DEFAULT 7

7: 00000000 0 SECTION LOCAL DEFAULT 8

8: 00000000 0 SECTION LOCAL DEFAULT 10

9: 00000000 0 SECTION LOCAL DEFAULT 12

10: 00000000 0 SECTION LOCAL DEFAULT 14

11: 00000000 0 SECTION LOCAL DEFAULT 15

12: 00000000 0 SECTION LOCAL DEFAULT 13

13: 00000004 4 OBJECT GLOBAL DEFAULT COM i

14: 00000000 46 FUNC GLOBAL DEFAULT 1 main

15: 0000002e 6 FUNC GLOBAL DEFAULT 1 foo

so its symbol value is a memory alignment for assigning a proper memory address that conforms to the alignment in the final memory address. In the case of i, the value is 4, so the starting memory address of i in the final binary file will be a multiple of 4.
A symbol whose Ndx identifies a specific section,: its symbol value holds a section offset.

Example 0.71. In the symbol table, main and foo belong to section 1:

14: 00000000 46 FUNC GLOBAL DEFAULT 1 main

15: 0000002e 6 FUNC GLOBAL DEFAULT 1 foo

which is .text
34
.text holds program code and read-only data.

section
35
The command for listing sections is (assume the object file is hello.o):

readelf -S hello.o

:

There are 20 section headers, starting at offset 0x558:

Section Headers:

[Nr] Name Type Addr Off Size ES Flg Lk Inf Al

[ 0] NULL 00000000 000000 000000 00 0 0 0

[ 1] .text PROGBITS 00000000 000034 000034 00 AX 0 0 1

[ 2] .rel.text REL 00000000 000414 000010 08 I 18 1 4

[ 3] .data PROGBITS 00000000 000068 000000 00 WA 0 0 1

[ 4] .bss NOBITS 00000000 000068 000000 00 WA 0 0 1

[ 5] .debug_info PROGBITS 00000000 000068 000096 00 0 0 1

..... remaining output omitted for clarity....
In the final executable and shared object files,: instead of the above values, a symbol value holds a memory address.

Example 0.72. After compiling hello.o into the final executable hello, the symbol table now contains the memory address for each symbol
36
The command to compile the object file hello.o into the executable hello:

gcc -g -m32 -masm=intel hello.o -o hello

:

Symbol table '.symtab' contains 75 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

1: 08048154 0 SECTION LOCAL DEFAULT 1

2: 08048168 0 SECTION LOCAL DEFAULT 2

3: 08048188 0 SECTION LOCAL DEFAULT 3

....output omitted...

64: 08048409 6 FUNC GLOBAL DEFAULT 14 foo

65: 0804a020 0 NOTYPE GLOBAL DEFAULT 26 _end

66: 080482e0 0 FUNC GLOBAL DEFAULT 14 _start

67: 08048488 4 OBJECT GLOBAL DEFAULT 16 _fp_hw

68: 0804a01c 4 OBJECT GLOBAL DEFAULT 26 i

69: 0804a018 0 NOTYPE GLOBAL DEFAULT 26 __bss_start

70: 080483db 46 FUNC GLOBAL DEFAULT 14 main

...ouput omitted...

Unlike the values of the symbols foo, i and main as in the hello.o object file, the complete memory addresses are in place.

Now it suffices to understand relocation types. Previously, we mentioned the type R_386_PC32. The following formula is applied for relocation (Inteli386 psABI):

R e l o c a t e d O f f s e t = S + A - P

where

S: represents the value of the symbol. In the final executable binary, it is the address of the symbol.
A: represents the addend, an extra value added to the value of a symbol.
P: Represents the memory address to be fixed.
Relocate Offset: is the distance between a relocating location
37
where the referenced memory address is to be fixed.

and the actual memory location of a symbol definition, or a memory address.

But why do we waste time in calculating a distance instead of replacing with a direct memory address? The reason is that x86 architecture does not use employ any addressing mode that uses an absolute memory address, as listed in table 3.. All addressing modes in x86 are relative. In some assembly language, an absolute address can be used simply because it is a syntactic sugar that is later transformed into one of the relative addressing mode provided by the x86 hardware by the assembler.

Example 0.73. For the foo symbol:

0000001c 00000a02 R_386_PC32 0000002e foo

The distance between the usage of foo in main.o and its definition, applying the formula

S + A - P

is:

2 e + 0 - 1 c = 12

. That is, the place where memory fixing starts is 0x12 or 18 bytes away from the definition of the symbol foo. However, to make an instruction works properly, we must also subtract 4 from 0x12 and results in 0xe. Why the extra -4? Because the relative address starts at the end of an instruction, not the address where memory fixing starts. For that reason, we must also exclude the 4 bytes of the overwritten address.

Indeed, looking at the objdump output of the object file hello.o:

Disassembly of section .text:

00000000 <main>:

0: 8d 4c 24 04 lea ecx,[esp+0x4]

4: 83 e4 f0 and esp,0xfffffff0

7: ff 71 fc push DWORD PTR [ecx-0x4]

a: 55 push ebp

b: 89 e5 mov ebp,esp

d: 51 push ecx

e: 83 ec 04 sub esp,0x4

11: c7 05 00 00 00 00 05 mov DWORD PTR ds:0x0,0x5

18: 00 00 00

1b: e8 fc ff ff ff call 1c <main+0x1c>

20: b8 00 00 00 00 mov eax,0x0

25: 83 c4 04 add esp,0x4

28: 59 pop ecx

29: 5d pop ebp

2a: 8d 61 fc lea esp,[ecx-0x4]

2d: c3 ret

0000002e <foo>:

2e: 55 push ebp

2f: 89 e5 mov ebp,esp

31: 90 nop

32: 5d pop ebp

33: c3 ret

The place where memory fixing starts is after the opcode e8, with the mock value fc ff ff ff, which is -4 in decimal. However, the assembly code, the value is displayed as 1c. The memory address right after e8. The reason is that the instruction e8 starts at 1b and ends at 20

The end of an instruction is the memory address right after its last operand. The whole instruction e8 spans from the address 1b to the address 1f.

. -4 means 4 bytes backward from the end of instruction, that is:

20 - 4 = 1 c

. After linking, the output of the final executable file is displayed with the actual memory fixing:

080483db <main>:

80483db: 8d 4c 24 04 lea ecx,[esp+0x4]

80483df: 83 e4 f0 and esp,0xfffffff0

80483e2: ff 71 fc push DWORD PTR [ecx-0x4]

80483e5: 55 push ebp

80483e6: 89 e5 mov ebp,esp

80483e8: 51 push ecx

80483e9: 83 ec 04 sub esp,0x4

80483ec: c7 05 1c a0 04 08 05 mov DWORD PTR ds:0x804a01c,0x5

80483f3: 00 00 00

80483f6: e8 0e 00 00 00 call 8048409 <foo>

80483fb: b8 00 00 00 00 mov eax,0x0

8048400: 83 c4 04 add esp,0x4

8048403: 59 pop ecx

8048404: 5d pop ebp

8048405: 8d 61 fc lea esp,[ecx-0x4]

8048408: c3 ret

08048409 <foo>:

8048409: 55 push ebp

804840a: 89 e5 mov ebp,esp

804840c: 90 nop

804840d: 5d pop ebp

804840e: c3 ret

804840f: 90 nop

In the final output, the opcode e8 previously at 1b now starts at the address 80483f6. The mock value fc ff ff ff is replaced with the actual value 0e 00 00 00 using the same calculating method from its object file: opcode e8 is at 80483f6. The definition of foo is at 8048409. The offset from the next address after e8 is

8048409 + 0 - 80483 f 7 - 4 = 0 e

. However, for readability, the assembly is displayed as call 8048409 <foo>, since GNU as

Or any current assembler in use today.

assembler allows specifying the actual memory address of a symbol definition. Such address is later translated into relative addressing mode, saving the programmer the trouble of calculating offset manually.

Sym. Name

This field displays the name of a symbol to be relocated. The named symbol is the same as written in a high level language such as C.

Crafting ELF binary with linker scripts

A linker

linker is a program that combines separated object files into a final binary file. When gcc is invoked, it runs ld underneath to turn object files into the final executable file..

A
linker script

linker script is a text file that instructs how a linker should combine object files. When gcc runs, it uses its default linker script to build the memory layout of a compiled binary file. Standardized memory layout is called object file format e.g. ELF includes program headers, section headers and their attributes. The default linker script is made for running in the current operating system environment

To view the default script, use --verbose option:

ld --verbose

. Running on bare metal, the default script cannot be used as it is not designed for such environment. For that reason, a programmer needs to supply his own linker script for such environments.

Every linker script consists of a series of commands with the following format:

.... more sub-command....

Each sub-command is specific to only the top-level command. The simplest linker script needs only one command: SECTION, that consumes input sections from object files and produces output sections of the final binary file

Recall that sections are chunks of code or data, or both.

Example linker script

Here is a minimal example of a linker script:

SECTIONS                      /* Command */
{
   . = 0x10000;               /* sub-command 1 */
   .text : { *(.text) }       /* sub-command 2 */
   . = 0x8000000;             /* sub-command 3 */
   .data : { *(.data) }       /* sub-command 4 */
   .bss : { *(.bss) }         /* sub-command 5 */
}

Code Dissection:

Code	Description
SECTION	Top-level command that declares a list of custom program sections. ld provides a set of such commands.
. = 0x10000;	Set location counter to the address 0x10000. Location counter specifies the base address for subsequent commands. In this example, subsequent commands will use 0x10000 onward.
.text : { *(.text) }	Since location counter is set to 0x10000, the output .text in the final binary file will starts at the address 0x10000. This command combines all .text sections from all object files with (.text) syntax into a final .text section. The is the wildcard which matches any file name.
. = 0x8000000;	Again, the location counter is set to 0x8000000. Subsequent commands will use this address for working with sections.
.data : { *(.data) }	All .data section are combined into one .data section in the final binary file.
.bss : { *(.bss) }	All .bss section are combined into one .bss section in the final binary file.

The addresses 0x10000 and 0x8000000 are called Virtual Memory Address. A
virtual memory address

virtual memory address is the address where a section is loaded in memory when a program runs. To use the linker script, we save it as a file e.g. main.lds

.lds is the extension for linker script.

; then, we need a sample program in a file, e.g. main.c:

void test() {}
int main(int argc, char *argv[])
{
  
    return 0;
}

Then, we compile the file and explicitly invoke ld with the linker script:

$ gcc -m32 -g -c main.c

In the ld command, the options are similar to gcc:

Option	Description
-m	Specify object file format that ld produces. In the example, elf_i386 means a 32-bit ELF is to be produced.
-o	Specify the name of the final executable binary.
-T	Specify the linker script to use. In the example, it is main.lds.

The remaining input is a list of object files for linking. After the command ld is executed, the final executable binary - main - is produced. If we try running it:

$ ./main

Segmentation fault

The reason is that when linking manually, the entry address must be explicitly set, or else ld sets it to the start of .text section by default. We can verify from the readelf output:

$ readelf -h main

Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00

Class: ELF64

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: UNIX - System V

ABI Version: 0

Type: EXEC (Executable file)

Machine: Advanced Micro Devices X86-64

Version: 0x1

Entry point address: 0x10000

Start of program headers: 64 (bytes into file)

Start of section headers: 2098144 (bytes into file)

Flags: 0x0

Size of this header: 64 (bytes)

Size of program headers: 56 (bytes)

Number of program headers: 3

Size of section headers: 64 (bytes)

Number of section headers: 14

Section header string table index: 11

The entry point address is set to 0x10000, which is the beginning of .text section. Using objdump to examine the address:

$ objdump -z -M intel -S -D prog | less

we see that the address 0x10000 does not start at main function when the program runs:

Disassembly of section .text:

10001: 89 e5 mov ebp,esp

int main(int argc, char *argv[])

{

10006: 55 push ebp

10007: 89 e5 mov ebp,esp

return 0;

10009: b8 00 00 00 00 mov eax,0x0

The start of .text section at 0x10000 is the function test, not main! To enable the program to run at main properly, we need to set the entry point in the linker script with the following line at the beginning of the file:

1000e: 5d pop ebp

1000f: c3 ret

ENTRY(main)

Recompile the executable binary file main again. This time, the output from readelf is different:

Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

Class: ELF32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: UNIX - System V

ABI Version: 0

Type: EXEC (Executable file)

Machine: Intel 80386

Version: 0x1

Entry point address: 0x10006

Start of program headers: 52 (bytes into file)

Start of section headers: 9168 (bytes into file)

Flags: 0x0

Size of this header: 52 (bytes)

Size of program headers: 32 (bytes)

Number of program headers: 3

Size of section headers: 40 (bytes)

Number of section headers: 14

Section header string table index: 11

The program now executes code at the address 0x10006 when it starts. 0x10006 is where main starts! To make sure we really starts at main, we run the program with gdb, set two breakpoints at main and test functions:

$ gdb ./main

.... output omitted ....

Reading symbols from ./main...done.

(gdb) b test

Breakpoint 1 at 0x10003: file main.c, line 1.

(gdb) b main

Breakpoint 2 at 0x10009: file main.c, line 5.

Starting program: /tmp/main

Breakpoint 2, main (argc=-11493, argv=0x0) at main.c:5

5 return 0;

As displayed in the output, gdb stopped at the 2^nd breakpoint first. Now, we run the program normally, without gdb:

$ ./main

Segmentation fault

We still get a segmentation fault. It is to be expected, as we ran a custom binary without C runtime support from the operating system. The last statement in the main function: return 0, simply returns to a random place

Return address is above the current ebp. However, when we enter main, no return value is pushed on the stack. So, when return is executed, it simply retrieves any value above ebp and use as a return address.

. The C runtime ensures that the program exit properly. In Linux, the _exit() function is implicitly called when main returns. To fix this problem, we simply change the program to exit properly:

void test() {}
int main(int argc, char *argv[])
{
    asm("mov eax, 0x1\n"
        "mov ebx, 0x0\n"
        "int 0x80");
}

Inline assembly is required because interrupt 0x80 is defined for system calls in Linux. Since the program uses no library, there is no other way to call system functions, aside from using assembly. However, when writing our operating system, we will not need such code, as there is no environment for exiting properly yet.

Now that we can precisely control where the program runs initially, it is easy to bootstrap the kernel from the bootloader. Before we move on to the next section, note how readelf and objdump can be applied to debug a program even before it runs.

Understand the custom ELF structure

In the example, we manage to create a runnable ELF executable binary from a custom linker script, as opposed to the default one provided by gcc. To make it convenient to look into its structure:

$ readelf -e main

-e option is the combination of 3 options -h -l -S:

....... ELF header output omitted .......

[ 1] .text PROGBITS 00010000 001000 000010 00 AX 0 0 1

[ 2] .eh_frame PROGBITS 00010010 001010 000058 00 A 0 0 4

[ 3] .debug_info PROGBITS 00000000 001068 000087 00 0 0 1

[ 4] .debug_abbrev PROGBITS 00000000 0010ef 000074 00 0 0 1

[ 5] .debug_aranges PROGBITS 00000000 001163 000020 00 0 0 1

[ 6] .debug_line PROGBITS 00000000 001183 000038 00 0 0 1

[ 7] .debug_str PROGBITS 00000000 0011bb 000078 01 MS 0 0 1

[ 8] .comment PROGBITS 00000000 001233 000034 01 MS 0 0 1

[ 9] .shstrtab STRTAB 00000000 00133a 000074 00 0 0 1

[10] .symtab SYMTAB 00000000 001268 0000c0 10 11 10 4

[11] .strtab STRTAB 00000000 001328 000012 00 0 0 1

W (write), A (alloc), X (execute), M (merge), S (strings)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

LOAD 0x001000 0x00010000 0x00010000 0x00068 0x00068 R E 0x1000

GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10

The structure is incredibly simple. Both the segment and section listings can be contained within one screen. This is not the case with default ELF executable binary. From the output, there are only 11 sections, and only two are loaded at runtime: .text and .eh_frame because both section are assigned with an actual memroy addresses, 0x10000 and 0x10010 respectively. The remaining sections are assigned with 0 in the final executable binary

00 .text .eh_frame

As opposed to the object files, where memory addresses are always 0 and only assigned with actual values in the linking process.

, which mean they are not loaded at runtime. It makes sense, as those sections are related to versioning

It is the .comment section. It can be viewed with the comment readelf -p .comment main.

, debugging

The ones starts with .debug prefix.

and linking

The symbol tables and string table.

The program segment header table is even simpler. It only contains 2 segments: LOAD and GNU_STACK. By default, if the linker script does not supply the instructions for building program segments, ld provides reasonable default segments. As in this case, .text should be in the LOAD segment. GNU_STACK segment is a GNU extension used by the Linux kernel to control the state of the program stack. We will not need this segment, along with .eh_frame, which is for exception handling, as we write our own operating system from scratch. To achieve these goals, we will need to create our own program headers instead of letting ld handles the task, and instruct ld to remove .eh_frame.

Manipulate the program segments

First, we need to craft our own program header table by using the following syntax:

PHDRS
{
  <name> <type> [ FILEHDR ] [ PHDRS ] [ AT ( address ) ]
        [ FLAGS ( flags ) ] ;
}

PHDRS command, similar to SECTION command, but for declaring a list of custom program segments with a predefined syntax.

name: is the header name for later referenced by a section declared in SECTION command.
type

Example 0.74. With only name and type, we can create any number of program segments. For example, we can add the NULL program segment and remove the GNU_STACK segment:

PHDRS
{
    null PT_NULL;
    code PT_LOAD;
}

SECTIONS
{
    . = 0x10000;
    .text : { *(.text) } :code
    . = 0x8000000;
    .data : { *(.data) }
    .bss : { *(.bss) }
}

The content of PHDRS command tells that the final executable binary contains 2 program segments: NULL and LOAD. The NULL segment is given the name null and LOAD segment given the name code to signify this LOAD segment contains program code. Then, to put a section into a segment, we use the syntax :<phdr>, where phdr is the name given to a segment earlier. In this example, .text section is put into code segment. We compile and see the result (assuming main.o compiled earlier remains):

$ ld -m elf_i386 -o main -T main.lds main.o

$ readelf -l main

Elf file type is EXEC (Executable file)

Entry point 0x10000

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4

LOAD 0x001000 0x00010000 0x00010000 0x00010 0x00010 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text .eh_frame

Those 2 segments are now NULL and LOAD instead of LOAD and GNU_STACK.

Example 0.75. We can add as many segments of the same type, as long as they are given different names:

PHDRS
{
    null1 PT_NULL;
    null2 PT_NULL;
    code1 PT_LOAD;
    code2 PT_LOAD;
}

SECTIONS
{
    . = 0x10000;
    .text : { *(.text) } :code1
    .eh_frame : { *(.eh_frame) } :code2
    . = 0x8000000;
    .data : { *(.data) }
    .bss : { *(.bss) }
}

After amending the PHDRS content earlier with this new segment listing, we put .text into code1 segment and .eh_frame into code2 segment, we compile and see the new segments:

$ ld -m elf_i386 -o main -T main.lds main.o

$ readelf -l main

Elf file type is EXEC (Executable file)

Entry point 0x10000

There are 4 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4

LOAD 0x001000 0x00010000 0x00010000 0x00010 0x00010 R E 0x1000

LOAD 0x001010 0x00010010 0x00010010 0x00058 0x00058 R 0x1000

Section to Segment mapping:

Segment Sections...

02 .text

03 .eh_frame

Now .text and .eh_frame are in different segments.

FILEHDR: is an optional keyword, when added specifies that a program segment includes the ELF file header of the executable binary. However, this attribute should only added for the first program segment, as it drastically alters the size and starting address of a segment because the ELF header is always at the beginning of a binary file, recall that a segment starts at the address of its first content, which is in most of the cases (except for this case, which is the file header), the first section.

Example 0.76. Adding the FILEHDR keyword changes the size of NULL segment:

PHDRS
{
    null PT_NULL FILEHDR;
    code PT_LOAD;
}
..... content is the same .....

We link it again and see the result:

$ ld -m elf_i386 -o main -T main.lds main.o

$ readelf -l main

Elf file type is EXEC (Executable file)

Entry point 0x10000

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

NULL 0x000000 0x00000000 0x00000000 0x00034 0x00034 R 0x4

LOAD 0x001000 0x00010000 0x00010000 0x00068 0x00068 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text .eh_frame

In previous examples, the file size and memory size of the NULL section are always 0, now they are both 34 bytes, which is the size of an ELF header.

Example 0.77. If we assign FILEHDR to a non-starting segment, its size and starting address changes significantly:

PHDRS
{
    null PT_NULL;
    code PT_LOAD FILEHDR;
}
..... content is the same .....

$ ld -m elf_i386 -o main -T main.lds main.o

$ readelf -l main

Elf file type is EXEC (Executable file)

Entry point 0x10000

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

NULL 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4

LOAD 0x000000 0x0000f000 0x0000f000 0x01068 0x01068 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text .eh_frame

The size of the LOAD segment in the previous example is only 0x68, the same size as the total sizes of .text and .eh_frame sections in it. But now, it is 0x01068, got 0x1000 bytes larger. What is the reason for these extra bytes? A simple answer: segment alignment. From the output, the alignment of this segment is 0x1000; it means that regardless of which address is the start of this segment, it must be divisible by 0x1000. For that reason, the starting address of LOAD is 0xf000 because it is divisible by 0x1000.

Another question arises: why is the starting address 0xf000 instead of 0x10000? .text is the first section, which starts at 0x10000, so the segment should start at 0x10000. The reason is that we include FILEHDR as part of the segment, it must expand to include the ELF file header, which is at the very start of an ELF executable binary. To satisfy this constraint and the alignment constraint, 0xf000 is the closest address. Note that the virtual and physical memory addresses are the addresses at runtime, not the locations of the segment in the file on disk. As the FileSiz field shows, the segment only consumes 0x1068 bytes on disk. Figure Footnote 47 illustrates the difference between the memory layouts with and without FILEHDR keyword.

Figure 0.22: LOAD segment on disk and in memory.

Sub-Figure a: Without FILEHDR.

Sub-Figure b: With FILEHDR.

PHDRS: is an optional keyword, when added specifies that a program segment is a program segment header table.

Example 0.78. The first segment of the default executable binary generated by gcc is a PHDR since the program segment header table appears right after the ELF header. It is also a convenient segment to put the ELF header into using the FILEHDR keyword. We replace the unused NULL segment earlier with a PHDR segment:

PHDRS
{
    headers PT_PHDR FILEHDR PHDRS;
    code PT_LOAD FILEHDR;
}
..... content is the same .....

Entry point 0x10000

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x001000 0x00010000 0x00010000 0x00068 0x00068 R E 0x1000

As shown in the output, the first segment is of type PHDR. Its size is 0x74, which includes:

01 .text .eh_frame

0x34 bytes for ELF header.
0x40 bytes for the program segment header table, with 2 entries, each is 0x20 bytes (32 bytes) in length.

The above number is consistent with ELF header output:

Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

Class: ELF32

....... output omitted ......

Size of this header: 52 (bytes) --> 0x34 bytes

Size of program headers: 32 (bytes) --> 0x20 bytes each program header

Number of program headers: 2 --> 0x40 bytes in total

Size of section headers: 40 (bytes)

Number of section headers: 12

Section header string table index: 9

AT ( address )

Example 0.79. We can specify a load memory address for the segment LOAD with AT syntax:

PHDRS
{
    headers PT_PHDR FILEHDR PHDRS AT(0x500);
    code PT_LOAD;
}
..... content is the same .....

Entry point 0x4000

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000500 0x00074 0x00074 R 0x4

LOAD 0x001000 0x00004000 0x00002000 0x00068 0x00068 R E 0x1000

It depends on an operating system whether to use the address or not. For our operating system, the virtual memory address and load are the same, so an explicit load address is none of our concern.

01 .text .eh_frame

FLAGS (flags)

assigns permissions to a segment. Each flag is an integer that represents a permission and can be combined with OR operations. Possible values:

Permission	Value	Description
R	1	Readable
W	2	Writable
E	4	Executable

Example 0.80. We can create a LOAD segment with Read, Write and Execute permissions enabled:

PHDRS
{
    headers PT_PHDR FILEHDR PHDRS AT(0x500);
    code PT_LOAD FILEHDR FLAGS(0x1 | 0x2 | 0x4);
}
..... content is the same .....

Entry point 0x0

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000500 0x00074 0x00074 R 0x4

LOAD 0x001000 0x00000000 0x00000000 0x00010 0x00010 RWE 0x1000

LOAD segment now gets all the RWE permissions, as shown above.

01 .text .eh_frame

Finally, we want to remove the .eh_frame or any unwanted section, we add a special section called /DISCARD/:

... program segment header table remains the same ...

SECTIONS
{
    /* . = 0x10000; */
    .text : { *(.text) } :code
    . = 0x8000000;
    .data : { *(.data) }
    .bss : { *(.bss) }
    /DISCARD/ : { *(.eh_frame) }
}

Any section putting in /DISCARD/ disappears in the final executable binary:

Entry point 0x0

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000500 0x00074 0x00074 R 0x4

LOAD 0x001000 0x00000000 0x00000000 0x00010 0x00010 R E 0x1000

As can be seen, .eh_frame is nowhere to be found.

01 .text

C Runtime: Hosted vs Freestanding

The purpose of .init, .init_array, .fini_array and .preinit_array section is to initialize a C Runtime environment that supports the C standard libraries. Why does C need a runtime environment, when it is supposed to be a compiled language? The reason is that many of the standard functions depend on the underlying operating system, which is of itself a big runtime environment. For example, I/O related functions such as reading from keyboard with gets(), reading from file with open(), printing on screen with printf(), managing system memory with malloc(), free(), etc.

A C implementation cannot provide such routines without a running operating system, which is a hosted environment. A hosted environment is a runtime environment that:

provides a default implementation of C libraries that includes system-dependent data and routines.
perform resource allocations to prepare an environment for a program to run.

This process is similar to the hardware initialization process:

When first powered up, a desktop computer loads its basic system routines from a read-only memory stored on the motherboard.
Then, it starts initializing an environment, such as setting default values for various registers in CPU and devices, before executing the any code.

In contrast, a freestanding environment is an environment that does not provide system-dependent data and routines. As a consequence, almost no C library exists and the environment can run code compiled written from pure C syntax. For a free standing environment to become a host environment, it must implement standard C system routines. But for a conforming freestanding environment, it only needs these header files available: <float.h>, <limits.h>, <stadarg.h> and <stddef.h> (according to GCC manual).

For a typical desktop x86 program, C runtime environment is initialized by a compiler so a program runs normal. However, for an embedded platform where a program runs directly on it, this is not the case. The typical C runtime environment used in desktop operating systems cannot be used on the embedded platforms, because architectural differences and resource constraints. As such, the software writer must implement a custom C runtime environment suitable for the targeted platform. For the embedded platform,

In writing our operating system, the first step is to create a freestanding environment before creating a hosted one.

Debuggable bootloader on bare metal

Currently, the bootloader is compiled as a flat binary file. Although gdb can display the assembly code, it is not always the same as the source code. In the assembly source code, there exists variable names and labels. These symbols are lost when compiled as a flat binary file, making debugging more difficult. Another issue is the mismatch between the written assembly source code and the displayed assembly source code. The written code might contain higher level syntax that is assembler-specific and is generated into lower-level assembly code as displayed by gdb. Finally, with debug information available, the command next/n and prev/p can be used instead of ni and si.

To enable debug information, we modify the bootloader Makefile:

The bootloader must be compiled as a ELF binary. Open the Makefile in bootloader/ directory and change this line under $(BUILD_DIR)/%.o: %.asm recipe:
```
nasm -f bin $< -o $@
```
to this line:
```
nasm -f elf $< -F dwarf -g -o $@
```
In the updated recipe, bin format is replaced with elf format to enable debugging information to be properly produced.-F option specifies the debug information format, which is dwarf in this case. Finally, -g option causes nasm to actually generate debug information in selected format.
Then, ld consumes the ELF bootloader binary and produces another ELF bootloader binary, with proper starting memory address of .text section that match the actual address of the bootloader at runtime, when QEMU virtual machine loads it at 0x7c00. We need ld because when compiled by nasm, the starting address is assumed to be 0, not 0x7c00.
Finally, we use objcopy to separate extract only the flat binary content as the original bootloader by adding this line to $(BUILD_DIR)/%.o: %.asm:
```
objcopy -O binary $(BUILD_DIR)/bootloader.o.elf $@
```
objcopy, as its name implies, is a program that copies and translates object files. Here, we copy the original ELF bootloader and translate it into a flat binary file.

The updated recipe should look like:

$(BUILD_DIR)/%.o: %.asm
	nasm -f elf $< -F dwarf -g -o $@
	ld -m elf_i386 -T bootloader.lds $@ -o $@.elf
	objcopy -O binary  $(BUILD_DIR)/bootloader.o.elf $@

Now we test the bootloader with debug information available:

Start the QEMU machine:

$ make qemu

Start gdb with the debug information stored in bootloader.o.elf:

$ gdb build/bootloader/bootloader.o.elf

After getting into gdb, press the Enter key and if the sample .gdbinit section 7 is used, the output should look like:

---Type <return> to continue, or q <return> to quit---

[f000:fff0] 0x0000fff0 in ?? ()

Breakpoint 1 at 0x7c00: file bootloader.asm, line 6.

(gdb)

gdb now understand where the instruction at address 0x7c00 is in the assembly source file, thanks to the debug information.

Debuggable program on bare metal

The process of building a debug-ready executable binary is similar to that of a bootloader, except more involved. Recall that for a debugger to work properly, its debugging information must contain correct address mappings between memory addresses and the source code. gcc stores such mapping information in DIE entries, in which it tells gdb at which code address corresponds to a line in a source file, so that breakpoints work properly.

But first, we need a sample C source file, a very simple one:

void main() {}

Because this is a free standing environment, standard libraries that involve system functions such as printf() would not work, because a C runtime does not exist. At this stage, the goal is to correctly jump to main with source code displayed properly in gdb, so no fancy C code is needed yet.

The next step is updating os/Makefile:

BUILD_DIR=../build
OS=$(BUILD_DIR)/os

CFLAGS+=-ffreestanding -nostdlib -gdwarf-4 -m32 -ggdb3

OS_SRCS := $(wildcard *.c)
OS_OBJS := $(patsubst %.c, $(BUILD_DIR)/%.o, $(OS_SRCS))

all: $(OS)

$(BUILD_DIR)/%.o: %.c
	gcc $(CFLAGS) -c  $< -o $@

$(OS): $(OS_OBJS)
	ld -m elf_i386 -Tos.lds $(OS_OBJS) -o $@

clean:
	rm $(OS_OBJS)

We updated the Makefile with the following changes:

Add a CFLAGS variable for passing options to gcc.
Instead of the rule to build assembly source code earlier, it is replaced with a C version with a recipe to build C source files. The CFLAGS variable makes the gcc command in the recipe looks cleaner regardless how many options are added.
Add a linking command for building the final executable binary of the operating system with a custom linker script os.lds.

Everything looks good, except for the linker script part. Why is it needed? The linker script is required for controlling at which physical memory address the operating system binary appears in the memory, so the linker can jump to the operating system code and execute it. To complete this requirement, the default linker script used by gcc would not work as it assumes the compiled executable runs inside an existing operating system, while we are writing an operating system itself.

The next question is, what will be the content in the linker script? To answer this question, we must understand what goals to achieve with the linker script:

For the bootloader to correctly jump to and execute the operating system code.
For gdb to debug correctly with the operating system source code.

To achieve the goals, we must devise a design of a suitable memory layout for the operating system. Recall that the bootloader developed in chapter II can already load a simple binary compiled from the sample Assembly program sample.asm. To load the operating system, we can simply throw binary compiled from sample.asm with the binary compiled from os.c above.

If only it is that simple. The idea is correctly, but not enough. The goals implies the following constraints:

The operating system code is written in C and compiled as an ELF executable binary. It means, the bootloader needs to retrieve correct entry address from the ELF header.
To debug properly with gdb, the debug info must contain correct mappings between instruction addresses and source code.

Thanks to the understanding of ELF and DWARF acquire in the earlier chapters, we can certainly modify the bootloader and create an executable binary that satisfy the above constraint. We will solve these problems one by one.

Loading an ELF binary from a bootloader

Earlier we examined that an ELF header contains a entry address of a program. That information is 0x18 bytes away from the beginning of an ELF header, according to man elf :

typedef struct {
               unsigned char e_ident[EI_NIDENT];
               uint16_t      e_type;
               uint16_t      e_machine;
               uint32_t      e_version;
               ElfN_Addr     e_entry;
               ElfN_Off      e_phoff;
               ElfN_Off      e_shoff;
               uint32_t      e_flags;
               uint16_t      e_ehsize;
               uint16_t      e_phentsize;
               uint16_t      e_phnum;
               uint16_t      e_shentsize;
               uint16_t      e_shnum;
               uint16_t      e_shstrndx;
           } ElfN_Ehdr;

The offset from the start of the struct to the start of e_entry is:

16 bytes of e_ident[EI_NIDENT]:

#define EI_NIDENT 16
2 bytes of e_type
2 bytes of e_machine
4 bytes of e_version

O f f s e t = 16 + 2 + 2 + 4 = 24 = 0 x 18

e_entry is of type ElfN_Addr, in which N is either 32 or 64. We are writing 32-bit operating system, in this case

N = 32

and so ElfN_Addr is Elf32_Addr, which is 4 bytes long.

Example 0.81. With any program, such as this simple one:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("hello world!\n");
    return 0;
}

We can retrieve the entry address with a human-readable presentation using readelf:

$ gcc hello.c -o hello

$ readelf -h hello

ELF Header:

Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00

.... output omitted ....

Entry point address: 0x400430

.... output omitted ....

Or in raw binary with hd:

$ hd hello | less

00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|

00000010 02 00 3e 00 01 00 00 00 30 04 40 00 00 00 00 00 |..>.....0.@.....|

.........

The offset 0x18 is the start of the least-significant byte of e_entry, which is 0x30, followed by 04 40 00, together in reverse makes the address 0x00400430.

Now that we know where the position of the entry address in the ELF header, it is easy to modify the bootloader made in section II to retrieve and jump to the address:

;******************************************
; Bootloader.asm
; A Simple Bootloader
;******************************************
bits 16
start: jmp boot

;; constant and variable definitions
msg	db	"Welcome to My Operating System!", 0ah, 0dh, 0h

boot:
  cli	; no interrupts
  cld	; all that we need to init

  mov		ax, 50h

  ;; set the buffer
  mov	es, ax
  xor	bx, bx

  mov	al, 2					      ; read 2 sector
  mov	ch, 0                          ; we are reading the second sector past us,
                                        ; so its still on track 0
  mov	cl, 2					      ; sector to read (The second sector)
  mov	dh, 0					      ; head number
  mov	dl, 0					      ; drive number. Remember Drive 0 is floppy drive.

  mov	ah, 0x02			           ; read floppy sector function
  int	0x13					       ; call BIOS - Read the sector
  jmp	[500h + 18h]				  ; jump and execute the sector!

  hlt	; halt the system

  ; We have to be 512 bytes. Clear the rest of the bytes with 0
  times 510 - ($-$$) db 0
  dw 0xAA55				  ; Boot Signiture

It is as simple as that! First, we load the operating system binary at 0x500, then we retrieve the entry address at the offset 0x18 from 0x500, by first calculating the expression

500 h + 18 h = 518 h

to get the actual in-memory address, then retrieve the content by dereference it.

The first part is done. For the next part, we need to build an ELF operating system image for the bootloader to load. The first step is to create a linker script:

ENTRY(main);

PHDRS
{
  headers PT_PHDR FILEHDR PHDRS;
  code PT_LOAD;
}

SECTIONS
{
  .text 0x500: { *(.text)  } :code
  .data :  { *(.data)  }
  .bss :  { *(.bss) }
  /DISCARD/ : { *(.eh_frame) }
}

The script is straight-forward and remains almost the same as before. The only differences are:

main are explicitly specified as the entry point by specifying ENTRY(main).
.text is explicitly specified with 0x500 as its virtual memory address since we load the operating system image at 0x500.

After putting the script, we compile with make and it should work smoothly:

$ make clean; make

$ readelf -l build/os/os

Elf file type is EXEC (Executable file)

Entry point 0x500

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x000500 0x00000500 0x00000500 0x00040 0x00040 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text

All looks good, until we run it. We begin by starting the QEMU virtual machine:

$ make qemu

Then, start gdb and load the debug info (which is also in the same binary file) and set a breakpoint at main:

(gdb) symbol-file build/os/os

Reading symbols from build/os/os...done.

(gdb) b main

Breakpoint 2 at 0x500

Then we start the program:

(gdb) symbol-file build/os/os

Reading symbols from build/os/os...done.

(gdb) b main

Breakpoint 2 at 0x500

Keep the programming running until it stops at main:

(gdb) c

Continuing.

[ 0:7c00]

Breakpoint 1, 0x00007c00 in ?? ()

(gdb) c

Continuing.

[ 0: 500]

Breakpoint 2, main () at main.c:1

At this point, we switch the layout to the C source code instead of the registers:

(gdb) layout split

layout split creates a layout that consists of 3 smaller windows:

Source window at the top.
Assembly window in the middle.
Command window at the bottom.

After the command, the layout should look like this:

┌──main.c───────────────────────────────────────────────────────┐

B+>│1 void main(){} │

│2 │

│3 │

│4 │

│5 │

│6 │

│7 │

│8 │

│9 │

│10 │

│11 │

│12 │

│13 │

│14 │

│15 │

│16 │

└───────────────────────────────────────────────────────────────┘

B+>│0x500 <main> jg 0x547 │

│0x502 <main+2> dec sp │

│0x503 <main+3> inc si │

│0x504 <main+4> add WORD PTR [bx+di],ax │

│0x506 add WORD PTR [bx+si],ax │

│0x508 add BYTE PTR [bx+si],al │

│0x50a add BYTE PTR [bx+si],al │

│0x50c add BYTE PTR [bx+si],al │

│0x50e add BYTE PTR [bx+si],al │

│0x510 add al,BYTE PTR [bx+si] │

│0x512 add ax,WORD PTR [bx+si] │

│0x514 add WORD PTR [bx+si],ax │

│0x516 add BYTE PTR [bx+si],al │

│0x518 add BYTE PTR [di],al │

│0x51a add BYTE PTR [bx+si],al │

│0x51c xor al,0x0 │

│0x51e add BYTE PTR [bx+si],al │

└───────────────────────────────────────────────────────────────┘

remote Thread 1 In: main L1 PC: 0x500

[f000:fff0] 0x0000fff0 in ?? ()

Breakpoint 1 at 0x7c00

(gdb) symbol-file build/os/os

Reading symbols from build/os/os...done.

(gdb) b main

Breakpoint 2 at 0x500: file main.c, line 1.

(gdb) c

Continuing.

[ 0:7c00]

Breakpoint 1, 0x00007c00 in ?? ()

(gdb) c

Continuing.

[ 0: 500]

Breakpoint 2, main () at main.c:1

(gdb) layout split

(gdb)

Something wrong is going on here. It is not the generated assembly code for function call as it is known in section I. It is definitely wrong, verified with objdump:

$ objdump -D build/os/os | less

/home/tuhdo/workspace/os/build/os/os: file format elf32-i386

Disassembly of section .text:

00000500 <main>:

500: 55 push %ebp

501: 89 e5 mov %esp,%ebp

503: 90 nop

504: 5d pop %ebp

505: c3 ret

.... remaining output omitted ....

The assembly code of main is completely different. This is why understanding assembly code and its relation to high-level languages are important. Without the knowledge, we would have used gdb as a simple source-level debugger without bother looking at the assembly code from the split layout. As a consequence, the true cause of the non-working code could never been discovered.

Debugging the memory layout

What is the reason for the incorrect Assembly code in main displayed by gdb? There can only be one cause: the bootloader jumped to the wrong addresses. But why was the address wrong? We made the .text section at address 0x500, in which main code is in the first byte for executing, and instructed the bootloader to retrieve the address at the offset 0x18, then jump to the entry address.

MarginFigure 14: Memory state after loading 2^nd sector.

Then, it might be possible for the bootloader to load the operating system address at the wrong address. But then, we explicitly set the load address to 50h:00, which is 0x500, and so the correct address was used. After the bootloader loas the 2^nd sector, the in-memory state should look like the figure 48:

Here is the problem: 0x500 is the start of the ELF header. The bootloader actually loads the 2^nd sector, which stores the executable as a whole, to 0x500. Clearly, .text section, where main resides, is far from 0x500. Since the in-memory entry address of the executable binary is 0x500, .text should be at

0 x 500 + 0 x 500 = 0 x a 00

. However, the entry address recorded in the ELF header remains 0x500 and as a result, the bootloader jumped there instead of 0xa00. This is one of the issues that must be fixed.

The other issue is the mapping between debug info and the memory address. Because the debug info is compiled with the assumed offset 0x500 that is the start of .text section, but due to actual loading, the offset is pushed another 0x500 bytes, making the address actually is at 0xa00. This memory mismatch renders the debug info useless.

Figure 0.23: Wrong symbol-memory mappings in debug info.

image: 35_img_08_debug_info_bad_position.png

In summary, we have 2 problems to overcome:

Fix the entry address to account for the extra offset when loading into memory.
Fix the debug info to account for the extra offset when loading into memory.

First, we need to know the actual layout of the compiled executable binary:l

$ readelf -l build/os/os

Elf file type is EXEC (Executable file)

Entry point 0x500

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x000500 0x00000500 0x00000500 0x00040 0x00040 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text

Notice the Offset and the VirtAddress fields: both have the same value. This is problematic, as the entry address and the memory addresses in the debug info depend on VirtAddr field, but the Offset having the same value destroys the validity of VirtAddr

The offset is the distance in bytes between the beginning of the file, the address 0, to the beginning address of a segment or a section.

because it means that the real in-memory address will always be greater than the VirtAddr.

If we try to adjust the virtual memory address of the .text section in the linker script os.lds, whatever value we set also sets the Offset to the same value, until we set it to some value equal or greater than 0x1074:

Elf file type is EXEC (Executable file)

Entry point 0x1074

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x000074 0x00001074 0x00001074 0x00006 0x00006 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text

If we adjust the virtual address to 0x1073, both the Offset and VirtAddr still share the same value:

Elf file type is EXEC (Executable file)

Entry point 0x1073

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x001073 0x00001073 0x00001073 0x00006 0x00006 R E 0x1000

Section to Segment mapping:

Segment Sections...

01 .text

The key to answer such phenonemon is in the Align field. The value 0x1000 indicates that the offset address of the segment should be divisible by 0x1000, or if the distance between segment is divisible by 0x1000, the linker removes such distance to save the binary size. We can do some experiments to verify this claim

All the outputs are produced by the command:

$ readelf -l build/os/os

By setting the virtual address of .text to 0x0 to 0x73 (in os.lds), the offset starts from 0x1000 to 0x1073, accordingly. For example, by setting it to 0x0:

Elf file type is EXEC (Executable file)

Entry point 0x0

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x001000 0x00000000 0x00000000 0x00006 0x00006 R E 0x1000

Section to Segment mapping:

Segment Sections...

00

01 .text

By default, if we do not specify any virtual address, the offset stays at 0x1000 because 0x1000 is the perfect offset to satisfy the alignment constraint. Any addition from 0x1 to 0x73 makes the segment misaligned, but the linker keeps it anyway because it is told so.
By setting the virtual address of .text to 0x74 (in os.lds):

Elf file type is EXEC (Executable file)

Entry point 0x74

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x000074 0x00000074 0x00000074 0x00006 0x00006 R E 0x1000

Section to Segment mapping:

Segment Sections...

00

01 .text

PHDR is 0x74 bytes in size, so if LOAD starts at 0x1074, the distance between the PHDR segment and LOAD segment is $0 x 1074 - 0 x 74 = 0 x 1000$ bytes. To save space, it removes that extra 0x1000 bytes.
By setting the virtual address of .text to any value between 0x75 and 0x1073 (in os.lds), the offset takes the exact values specified, as can be seen in the case of setting to 0x1073 above.
By setting the virtual address of .text to any value equal or greater than 0x1074: it starts all over again at 0x74, where the distance is equal to 0x1000 bytes.

Now we get a hint how to control the values of Offset and VirtAddr to produce a desired binary layout. What we need is to change the Align field to a value with smaller value for finer grain control. It might work out with a binary layout like this:

Elf file type is EXEC (Executable file)

Entry point 0x600

There are 2 program headers, starting at offset 52

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x000100 0x00000600 0x00000600 0x00006 0x00006 R E 0x100

Section to Segment mapping:

Segment Sections...

01 .text

The binary will look like figure 50 in memory:

Figure 0.24: A good binary layout.

If we set the Offset field to 0x100 from the beginning of the file and the VirtAddr to 0x600, when loading in memory, the actual memory of .text is

0 x 500 + 0 x 100 = 0 x 600

; 0x500 is the memory location where the bootloader loads into the physical memory and 0x100 is the offset from the end of ELF header to .text. The entry address and the debug info will then take the value 0x600 from the VirtAddr field above, which totally matches the actual physical layout. We can do it by changing os.lds as follow:

ENTRY(main);

PHDRS
{
  headers PT_PHDR FILEHDR PHDRS;
  code PT_LOAD;
}

SECTIONS
{
  .text 0x600: ALIGN(0x100) { *(.text)  } :code
  .data :  { *(.data)  }
  .bss :  { *(.bss) }
  /DISCARD/ : { *(.eh_frame) }
}

The ALIGN keyword, as it implies, tells the linker to align a section, thus the segment containing it. However, to make the ALIGN keyword has any effect, automatic alignment must be disabled. According to man ld:

-n

--nmagic

Turn off page alignment of sections, and disable linking against shared

libraries. If the output format supports Unix style magic numbers, mark the

output as "NMAGIC"

That is, by default, each section is aligned by an operating system page, which is 4096, or 0x1000 bytes in size. The -n or -nmagic option disables this behavior, which is needed. We amend the ld command used in os/Makefile:

..... above content omitted ....
$(OS): $(OS_OBJS)
	ld -m elf_i386 -nmagic -Tos.lds $(OS_OBJS) -o $@

Finally, we also need to update the top-level Makefile to write more than one sector into the disk image for the operating system binary, as its size exceeds one sector:

$ ls -l build/os/os

-rwxrwxr-x 1 tuhdo tuhdo 9060 Feb 13 21:37 build/os/os

We update the rule so that the sectors are automatically calculated:

..... above content omitted ....
bootdisk:  bootloader os
	dd if=/dev/zero of=$(DISK_IMG) bs=512 count=2880
	dd conv=notrunc if=$(BOOTLOADER) of=$(DISK_IMG) bs=512 count=1 seek=0
	dd conv=notrunc if=$(OS) of=$(DISK_IMG) bs=512 count=$$(($(shell stat --printf="%s" $(OS))/512)) seek=1

After updating the everything, recompiling the executable binary and we get the desired offset and virtual memory at 0x100 and 0x600, respectively:

Entry point 0x600

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000000 0x00000000 0x00000000 0x00074 0x00074 R 0x4

LOAD 0x000100 0x00000600 0x00000600 0x00006 0x00006 R E 0x100