Version: 0.3.38

1. Procedure Calls#

Lectures:

Lecture 3.1 (slides)

Objectives:

Understand the mechanisms of procedure calls and their runtime and memory cost

Concepts:

call-stack, call-frame, static and dynamic memory allocation

So far we have used procedures here and there informally, to break down larger algorithms/programs into more manageable pieces. Now it is time to look more carefully to understand what happens at the machine-level and how much resources it takes.

1.1. Why Procedure Calls?#

Procedure call is an early construct that helps us progress from assembly code towards higher-level languages. They bring numerous benefits, including:

  • Better code reuse and duplication removal. If the same fragment of code appears in multiple place, we can extract it into a separate procedure and get rid of the duplicated code. That helps with bug fixing in particular and avoid having to roll the same fix in multiple places.

  • Better Code readability. Provided the procedures’ name are meaningful, the code reads better, and the intention (the algorithm) becomes more visible.

  • Better Separation of concerns. Procedures are handy to separate different concerns, parameter validation, error handling, etc. That makes the things clearer.

Remember for instance the selection sort, which I reproduce below in Listing 1.1.

Listing 1.1 A Ruby implementation of the selection sort, where find_minimum and swap are separate procedures#
 1def selection_sort(sequence)
 2  for index in 0 .. sequence.length-2
 3     minimum = find_minimum(sequence, index)
 4     swap(sequence, index, minimum)
 5  end
 6end
 7
 8def find_minimum(sequence, start)
 9  minimum = start
10  for index in start+1 .. sequence.length-1
11    if sequence[index] < sequence[minimum]
12      minimum = index
13    end
14  end
15  return minimum
16end
17
18def swap(sequence, left, right)
19   tmp = sequence[right]
20   sequence[right] = sequence[left]
21   sequence[left] = tmp
22end

We extracted two procedures, namely find_minimum and swap, which makes the selection_sort algorithm much easier to understand—at least for me. But what about efficiency? Do procedure calls incur an extra cost in time and/or in space?

Note

I use the terms caller and callee to refer to the two sides of a call. The caller is the code that calls and provides arguments, whereas the callee is the procedure that is called and returns a result.

In Listing 1.1 for instance, there is a procedure call Line 3, where selection_sort is the caller, and find_minimum is the callee.

1.2. How Does “Calling” Work?#

Random access machines (RAM) have no concepts of procedure. Their memory only contains symbols, which the machine interprets either as data or as instructions. Procedure is a programmer concept: It helps us reduce the complexity by breaking algorithms and programs into smaller pieces that are easier to understand.

So procedures is something that is handled mainly by the compiler, which translates our higher-level programming constructs into assembly and/or machine code. What does the compiler does with procedures declaration and calls. There are two approaches: inlining and branching.

1.2.1. Inlining#

The idea of inlining procedure calls is to replace every call to a given procedure by itself by the code of that very procedure.

Consider again our selection_sort algorithm shown in Listing 1.1. Inlining the calls to find_minimum and swap yields the following algorithm:

Listing 1.2 Inlining the swap and find_minimum procedures in selection_sort#
 1def selection_sort(sequence)
 2  for index in 0 .. sequence.length-2
 3     # minimum = find_minimum(sequence, index)
 4     minimum = index
 5     for i in index+1 .. sequence.length-1
 6       if sequence[i] < sequence[minimum]
 7         minimum = i
 8       end
 9     end
10     # swap(sequence, index, minimum)
11     tmp = sequence[minimum]
12     sequence[minimum] = sequence[index]
13     sequence[index] = tmp
14  end
15end

In Listing 1.2 we have to rename some variables to avoid name clashes. In this very example, Inlining shortens the overall program (15 lines against 22 in Listing 1.1), but in practice, inlining yields fat code, because there are always many calls for each procedure declaration.

But from the efficiency standpoint, this is perfect: There is no extra work involved and no extra memory consumed (except for the code segment). The compiler does the heavy lifting but in the generated machine code, there is no “calls” anymore.

Important

Inlining is very efficient and compilers may decide to inline some of your functions, when they see fit. C and C++ for instance, offer the inline keyword that forces inlining.

1.2.2. Branching#

The idea of branching is to replace procedure calls by JUMP instructions so that the machine “branch” to the body of the procedure and then, jumps back and resume execution right after the call, as shown on Fig. 1.8.

../_images/branching.svg

Fig. 1.8 Branching implies “jumping” back and forth between the caller and the callee instructions.#

Unfortunately, jumping is not enough, because the caller has to pass arguments and get a result back, if any.

To do that, we use the call stack: A dedicated area in memory, which we operate as a stack. For each procedure call, the caller and the callee use it to exchange informations. The caller pushes its arguments and the return address, and then jumps to callee. The callee can pop these information from the stack, pushes the result, and finally jumps back to the given return address.

1.3. The Call Stack#

1.3.1. In Practice#

As developers, we get to see this call stack every time our programs crash. Consider the terminal output below, where I run a Java implementation of the selection sort that crashes. The program exits with an ArrayOutOfBoundsException.

$ java Sort.java
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 9
     at Sort.findMinimumFrom(Sort.java:16)
     at Sort.selectionSort(Sort.java:6)
     at Sort.main(Sort.java:32)

What comes right after this exception is a summary of the call stack, precisely when the program crashed. Starting from the very bottom, the main procedure was active, the selectionSort was active too, and we were in the middle of the findMinimumFrom.

This information comes straight from the call stack. Each active procedure has a dedicate “record”, so called its activation frame, which contains all the information exchanged between the caller and the callee. When we call a procedure, a new frame is pushed onto the stack, and gets pop out when the procedure returns.

1.3.2. The Call Stack Structure#

Again, the call stack does not exist at the machine-level. It is more like a “design pattern”, which the compiler generates. The call stack generally resides at the end of the available memory, as shown on Fig. 1.9, and grows towards the data segment. If the stack grows so large that it collides with rest of the memory (i.e., data segment), the system raises the well-known stack overflow error.

../_images/call_stack.svg

Fig. 1.9 Memory layout, including the call stack.#

Each active procedure gets an activation record on the stack. Activation records include the following, also shown on Fig. 1.10:

  • The arguments, that is, the value of each parameter required by the callee.

  • The return address, where the callee must “jump” upon completion.

  • The local variables of the callee.

../_images/activation_frame.svg

Fig. 1.10 Example organization of an activation frame#

Important

The call stack automates the allocation of memory for local variables. The compiler automatically allocates all local variables into the call stack. When a procedure returns, its activation record is automatically released.

1.3.3. Calls in Assembly RAM Code#

To close the loop, let us see how we could extend our RAM model and support procedure calls. In assembly, there are two addresses that we need to track:

  • the top of the stack (aka. stack pointer or SP). It contains the address of the last item on the stack.

  • the frame pointer (aka. FP), which acts as a reference points within a single activation frame. Often, FP points to the first local variable. We access other fields indirectly, using FP+1 to access the return address.

Our RAM model only has a single register, ACC, which shall contain, by convention, the result of the procedure. So we place SP and FP in the last two memory cell.

To simplify the assembly code, we introduce some new features and higher level instructions.

  • Indirect addressing where we denote by [SP+1] the address contained at address SP+1. Here SP is an address, where lays another address. We will use these square brackets and offset to indicate such indirect access (aka. pointers).

  • PUSH <address> which decrements SP and writes the content at the given address on the top of the stack. That would be the same as the following snippet:

    LOAD       0        ;; decrement SP by one
    ADD        SP
    SUBTRACT   one
    STORE      SP
    LOAD       0        ;; Store "address" at SP
    ADD        address
    STORE      [SP]
    
  • POP, which increments SP. There is no need to actually erase the memory.

    LOAD 1
    ADD SP
    STORE SP
    

Equipped with these, we can now make a call to the swap procedure for instance. The only extract thing we have to do is for the callee to save the previous value of the frame pointer FP before to update it, and to restore it before to return.

          ;; Here is the code from the caller
          PUSH    right           ;; Push the 'right' arguments
          PUSH    left            ;; Push the 'left' arguments
          PUSH    sequence        ;; Push the 'sequence' arguments
          PUSH    return
          LOAD    0               ;; Branch to the "multiply" code
          JUMP    swap

return:   POP                     ;; Continue here once multiply returned
          POP
          POP
          POP
          PRINT   [SP]            ;; Print the result
          HALT

swap:     PUSH    FP              ;; Back-up FP onto the stack
          LOAD    0               ;; Update FP
          ADD     SP
          STORE   FP
          ;; Perform the swap. Arguments are accessible using indirect addressing
          PUSH    [FP+4]          ;; tmp = sequence[right]
          LOAD    0               ;; sequence[right] = sequence[left]
          ADD     [FP+3]
          STORE   [FP+4]
          LOAD    0               ;; sequence[left] = tmp
          ADD     [FP-1]
          STORE   [FP+3]

          ;;
          LOAD    0               ;; Restore Old "FP"
          ADD     [FP-1]
          STORE   FP
          POP                     ;; Remove FP
          LOAD    0               ;; Branch back to the "return" address
          JUMP    SP

1.4. How Much Does “Calling” Cost?#