04
Jun
2024

Hardware Mastery with ‘Assembly’: Connect code and hardware

Aaaand the exciting journey starts.

We are getting into assembly with the assumption that you are comfortable with the basics of some higher level programming language like C, C++, Python etc. If not, there is a chance you get confused by some topics. Let me know in the comments so that I can explain that part for you.

To master Assembly there are three things to understand:
1. Layout of program in memory – Chill, just the basics
2. Resources accessible by assembly
3. The Assembly Syntax

Thats all! I will try to condense the contents to the bare minimum for you to understand the assembly code. As each of these topics will lead you down the rabbit hole.

Layout of program in memory

When you execute a program, the operating system gives that program a portion of the memory (Address Space) for it to run. Each of these memory blocks are divided into segments which we can use for different purposes.

For easier understanding, imagine a dorm/hostel (Memory), where you are allocated a room by the landlord/warden(Operating system). Each room will contain a bedroom, a storage, washroom, maybe a kitchenette. They are there for specific use case and the dweller (Programmer) can use them accordingly. This is the chunk of memory assigned for the program:

Each of these sections serve different purposes. Everything used in the program will be allocated in between the low address and the high address.

Memory Segments

  1. [text] : This part of the memory is where the executable part of the program is going to sit. The commands that tells the processor what to do and when to do are filled here.
  2. [data] : This is where the program stores the initialized data. Global variables with values assigned are stored here. You might ask, Just global? Then what about the variables initialized locally? You’ll see soon…
    For example assignments like this in C -> [int a =5;]
  3. [bss] : Uninitialized data is stored in here. Again the values that are defined globally, but not assigned any values are put here.
    For example assignments like this in C -> [int a;]
  4. [heap] : This section is useful in reserving memory dynamically. Heap is used when the program reserves space during the runtime of the program. What if you wanted to store the information of people you talk with… you are not sure how many you are going to meet. so you create a space for the person each time you meet one. heap is specifically allocated for that. One thing to note… The heap memory grows upwards the address space (denoted by the arrowmark in fig).
  5. [stack] : Very very important when it comes to assembly programming. This is where your local variables are stored. This is also where functions inside your program gets its own data space also known as stack frame. You will be manipulating this a lot when you are writing functions in assembly. Note that the stack memory grown downwards (denoted by arrowmark in fig).

Fun fact: If by any chance the stack memory address in the program grows and goes over the heap memory address, which can crash or show unwanted behavior commonly known as ‘Stack Overflow’! Woohoo!

Resources accessible by Assembly

When you write your first assembly code, you will be controlling basically three things :

  • ALU (Arithmetic Logic Unit) – A portion of processor which can do a bunch of math if you’re not familiar with it…
  • Registers
  • Stack

ALU (Arithmetic Logic Unit)

ALU is the smallest hardware structure to perform arithmetic and logical operations. It can include operations like addition, subtraction, AND, OR etc. Assembly program can directly access all of these operations directly.

Registers

For every CPU, along with all the computing units, there exists very specific memories that can be directly accessed by the processor, known as Registers. Registers are extremely fast and is the closest in comparison to other memories like RAM or SSD. According to the x86-64 architecture, there are 16 general purpose registers which can be directly used in Assembly for our program.

Why are these important? The ALU we use to do the calculations are directly connected to only these registers. So keep in mind, if you have two integers in memory, you need to bring it to these registers before asking the ALU to do calculations with them.

x86-64 registers [Only the 64 bit registers are shown here]

raxRegister A
rbxRegister B
rcxRegister C
rdxRegister D
rbpRegister base pointer(start of stack)
rspRegister stack pointer(current location in stack)
rsiRegister source index (source for data copies)
rdiRegister destination index (destination for data copies)
r8Register 8
r9Register 9
r10Register 10
r11Register 11
r12Register 12
r13Register 13
r14Register 14
r15Register 15
General Purpose Registers x86-64
[Checkout this link for the full version of the registers]

Among these a few are important for the next lesson
1. rax – The accumulator register -> Stores most results of the calculations performed by ALU. Useful in getting the return of functions after the function call.
2. rbp – The base pointer -> When a function is called in assembly, a new memory area is assigned above the current stack for the function. Important to note that this register contains a pointer and not the value in the pointer. Additional variables for the function is added on top of this stack. Also known as Stack Frame. FYI : each function is assigned a stack frame.
3. rsp – The stack pointer -> This one points directly to the current location of the stack. Important to note that this register contains a pointer and not the value in the pointer.

rsp - rbp = size of the stack frame for the current function

R8 to R15 are general purpose registers which can theoretically be used as scratch registers. (We will cover the caveats of these soon).

The Assembly Syntax

Checkout this example:

section	.data
msg db 'Hello world!', 0xa ;string stored in msg

section	.text
   global _start
	
_start:
   mov	rdx,12     ;message length
   mov	rcx,msg     ;message to write
   mov	rbx,1       ;file descriptor for stdout
   mov	rax,4       ;system call value for sys_write
   int	0x80        ;call kernel
	
   mov	rax,1       ;system call value for sys_exit
   int	0x80        ;call kernel

Notice three things

  • sections:
    We can specify what piece of code can go which blocks. Use the above mentioned memory layout for your reference when writing the sections. They are defined as ‘section .[name of the section]’. Quite straightforward.
  • labels:
    Notice the small text that follows with a colon (_start:)? These are called labels that are used to note small ‘Checkpoints’ in the program which you can refer to jump at any point in the program. Useful to create conditions or loops in the program.
    Make sure to include the line ‘global [start label]’ as the first line in text section since it defines the entry point of the program.
  • instructions:
    These has the command name, and the operands following them. Can be zero or more operands depending on the command.
  • comments:
    When you write anything after a semicolon, it becomes a comment and is skipped in the execution.

Quite simple so far I see… Now, lets get the same code and see how it works. 💪

Part 3. Hardware Mastery with ‘Assembly’: Hello world analysis (In the works…)

Share

You may also like...

1 Response

  1. Dana Poulin says:

    hello

Leave a Reply

Your email address will not be published. Required fields are marked *