Version: 0.3.38
1. Procedure Calls#
- Lectures:
Lecture 3.1
(slides)
- Objectives:
Understand the mechanisms of procedure calls and their runtime and memory cost
- Concepts:
call-stack, call-frame, static and dynamic memory allocation
So far we have used procedures here and there informally, to break down larger algorithms/programs into more manageable pieces. Now it is time to look more carefully to understand what happens at the machine-level and how much resources it takes.
1.1. Why Procedure Calls?#
Procedure call is an early construct that helps us progress from assembly code towards higher-level languages. They bring numerous benefits, including:
Better code reuse and duplication removal. If the same fragment of code appears in multiple place, we can extract it into a separate procedure and get rid of the duplicated code. That helps with bug fixing in particular and avoid having to roll the same fix in multiple places.
Better Code readability. Provided the procedures’ name are meaningful, the code reads better, and the intention (the algorithm) becomes more visible.
Better Separation of concerns. Procedures are handy to separate different concerns, parameter validation, error handling, etc. That makes the things clearer.
Remember for instance the selection sort, which I reproduce below in Listing 1.1.
1def selection_sort(sequence)
2 for index in 0 .. sequence.length-2
3 minimum = find_minimum(sequence, index)
4 swap(sequence, index, minimum)
5 end
6end
7
8def find_minimum(sequence, start)
9 minimum = start
10 for index in start+1 .. sequence.length-1
11 if sequence[index] < sequence[minimum]
12 minimum = index
13 end
14 end
15 return minimum
16end
17
18def swap(sequence, left, right)
19 tmp = sequence[right]
20 sequence[right] = sequence[left]
21 sequence[left] = tmp
22end
We extracted two procedures, namely find_minimum
and
swap
, which makes the selection_sort
algorithm much
easier to understand—at least for me. But what about efficiency? Do
procedure calls incur an extra cost in time and/or in space?
Note
I use the terms caller and callee to refer to the two sides of a call. The caller is the code that calls and provides arguments, whereas the callee is the procedure that is called and returns a result.
In Listing 1.1 for instance, there is
a procedure call Line 3, where selection_sort
is the
caller, and find_minimum
is the callee.
1.2. How Does “Calling” Work?#
Random access machines (RAM) have no concepts of procedure. Their memory only contains symbols, which the machine interprets either as data or as instructions. Procedure is a programmer concept: It helps us reduce the complexity by breaking algorithms and programs into smaller pieces that are easier to understand.
So procedures is something that is handled mainly by the compiler, which translates our higher-level programming constructs into assembly and/or machine code. What does the compiler does with procedures declaration and calls. There are two approaches: inlining and branching.
1.2.1. Inlining#
The idea of inlining procedure calls is to replace every call to a given procedure by itself by the code of that very procedure.
Consider again our selection_sort
algorithm shown in
Listing 1.1. Inlining the calls to
find_minimum
and swap
yields the following algorithm:
1def selection_sort(sequence)
2 for index in 0 .. sequence.length-2
3 # minimum = find_minimum(sequence, index)
4 minimum = index
5 for i in index+1 .. sequence.length-1
6 if sequence[i] < sequence[minimum]
7 minimum = i
8 end
9 end
10 # swap(sequence, index, minimum)
11 tmp = sequence[minimum]
12 sequence[minimum] = sequence[index]
13 sequence[index] = tmp
14 end
15end
In Listing 1.2 we have to rename some variables to avoid name clashes. In this very example, Inlining shortens the overall program (15 lines against 22 in Listing 1.1), but in practice, inlining yields fat code, because there are always many calls for each procedure declaration.
But from the efficiency standpoint, this is perfect: There is no extra work involved and no extra memory consumed (except for the code segment). The compiler does the heavy lifting but in the generated machine code, there is no “calls” anymore.
Important
Inlining is very efficient and compilers may decide
to inline some of your functions, when they see fit. C and C++ for
instance, offer the inline
keyword that forces inlining.
1.2.2. Branching#
The idea of branching is to replace procedure calls by JUMP
instructions so that the machine “branch” to the body of the procedure
and then, jumps back and resume execution right after the call, as
shown on Fig. 1.8.
Fig. 1.8 Branching implies “jumping” back and forth between the caller and the callee instructions.#
Unfortunately, jumping is not enough, because the caller has to pass arguments and get a result back, if any.
To do that, we use the call stack: A dedicated area in memory, which we operate as a stack. For each procedure call, the caller and the callee use it to exchange informations. The caller pushes its arguments and the return address, and then jumps to callee. The callee can pop these information from the stack, pushes the result, and finally jumps back to the given return address.
1.3. The Call Stack#
1.3.1. In Practice#
As developers, we get to see this call stack every time our programs
crash. Consider the terminal output below, where I run a Java
implementation of the selection sort that crashes. The program exits
with an ArrayOutOfBoundsException
.
$ java Sort.java
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 9
at Sort.findMinimumFrom(Sort.java:16)
at Sort.selectionSort(Sort.java:6)
at Sort.main(Sort.java:32)
What comes right after this exception is a summary of the call
stack, precisely when the program crashed. Starting from the very
bottom, the main
procedure was active, the
selectionSort
was active too, and we were in the middle of the
findMinimumFrom
.
This information comes straight from the call stack. Each active procedure has a dedicate “record”, so called its activation frame, which contains all the information exchanged between the caller and the callee. When we call a procedure, a new frame is pushed onto the stack, and gets pop out when the procedure returns.
1.3.2. The Call Stack Structure#
Again, the call stack does not exist at the machine-level. It is more
like a “design pattern”, which the compiler generates. The call stack
generally resides at the end of the available memory, as shown on
Fig. 1.9, and grows towards the data
segment. If the stack grows so large that it collides with rest of the
memory (i.e., data segment), the system raises the well-known
stack overflow
error.
Fig. 1.9 Memory layout, including the call stack.#
Each active procedure gets an activation record on the stack. Activation records include the following, also shown on Fig. 1.10:
The arguments, that is, the value of each parameter required by the callee.
The return address, where the callee must “jump” upon completion.
The local variables of the callee.
Fig. 1.10 Example organization of an activation frame#
Important
The call stack automates the allocation of memory for local variables. The compiler automatically allocates all local variables into the call stack. When a procedure returns, its activation record is automatically released.
1.3.3. Calls in Assembly RAM Code#
To close the loop, let us see how we could extend our RAM model and support procedure calls. In assembly, there are two addresses that we need to track:
the top of the stack (aka. stack pointer or
SP
). It contains the address of the last item on the stack.the frame pointer (aka.
FP
), which acts as a reference points within a single activation frame. Often,FP
points to the first local variable. We access other fields indirectly, usingFP+1
to access the return address.
Our RAM model only has a single register, ACC
, which shall
contain, by convention, the result of the procedure. So we place
SP
and FP
in the last two memory cell.
To simplify the assembly code, we introduce some new features and higher level instructions.
Indirect addressing where we denote by
[SP+1]
the address contained at addressSP+1
. HereSP
is an address, where lays another address. We will use these square brackets and offset to indicate such indirect access (aka. pointers).PUSH <address>
which decrementsSP
and writes the content at the given address on the top of the stack. That would be the same as the following snippet:LOAD 0 ;; decrement SP by one ADD SP SUBTRACT one STORE SP LOAD 0 ;; Store "address" at SP ADD address STORE [SP]
POP
, which incrementsSP
. There is no need to actually erase the memory.LOAD 1 ADD SP STORE SP
Equipped with these, we can now make a call to the swap
procedure for instance. The only extract thing we have to do is for
the callee to save the previous value of the frame pointer FP
before to update it, and to restore it before to return.
;; Here is the code from the caller
PUSH right ;; Push the 'right' arguments
PUSH left ;; Push the 'left' arguments
PUSH sequence ;; Push the 'sequence' arguments
PUSH return
LOAD 0 ;; Branch to the "multiply" code
JUMP swap
return: POP ;; Continue here once multiply returned
POP
POP
POP
PRINT [SP] ;; Print the result
HALT
swap: PUSH FP ;; Back-up FP onto the stack
LOAD 0 ;; Update FP
ADD SP
STORE FP
;; Perform the swap. Arguments are accessible using indirect addressing
PUSH [FP+4] ;; tmp = sequence[right]
LOAD 0 ;; sequence[right] = sequence[left]
ADD [FP+3]
STORE [FP+4]
LOAD 0 ;; sequence[left] = tmp
ADD [FP-1]
STORE [FP+3]
;;
LOAD 0 ;; Restore Old "FP"
ADD [FP-1]
STORE FP
POP ;; Remove FP
LOAD 0 ;; Branch back to the "return" address
JUMP SP