Tuesday, April 19, 2011

Call Me Crazy: Calling Conventions

It's been said that the most significant contribution to computer science was the invention of the subroutine. Every major programming language has its own variation on how subroutines work. For instance, there are procedures, functions, methods, coroutines, etc. When implementing a compiled language or a virtual machine, you must decide how to implement subroutines at the assembly language level. For instance, you must decide how to pass arguments, what goes on the stack, what goes on the heap, what goes in registers, what goes in statically allocated memory, etc. These conventions are called calling conventions. You might not think of calling conventions very often, but they're an interesting topic.

First of all, what does a subroutine provide over a simple goto? A subroutine provides a way to pass arguments, execute a chunk of code, and return to where you were before you called the subroutine. You can implement each of these yourself in assembly, but standardizing calling conventions in a language makes the process more stable, repeatable, and less prone to error.

The way you implement calling conventions in a language like C is at least conceptually something like the following. You push all your arguments onto the stack; you push your current location in the code (i.e. the IP register) onto the stack; and then you jump to the subroutine. The subroutine executes its code, puts its return value in a register, and then jumps to the return-to address you put on the stack. The caller is responsible for popping the arguments off the stack (this is necessary to support varargs). Often, the compiler may optimize the code by putting values into registers rather than on the stack.

What's the difference between subroutines, procedures, functions, and methods? I think of subroutines as a low-level, catch-all term. In Pascal as well as some other languages, a procedure is expected to have side effects but not return a value, whereas a function is not supposed to have side effects but is supposed to return a value. Of course, not having side effects is a convention, not something enforced by the compiler. A method is a function call on an object. Typically, the object is implicitly passed as the first argument, although that may not be explicit in the syntax for the language.

Coroutines are a twist on subroutines. When you call a subroutine, it'll eventually return to the caller. If A calls B, then eventually B will return to A. Coroutines don't have to behave in this manner--i.e. they don't have to return to the caller. A may call B which may call C which may call A in a circular manner, without maintaining a stack of return-to addresses.

Recursion is a technique in which a function can call itself. Some languages like Lisp provide recursion as a low-level operation, and all looping constructs are built on top of recursion. There are some algorithms that are more elegantly implemented using recursion than looping such as the Ackermann function.

Recursion implies that you have some sort of stack (whether or not you use a C-level stack). That's because if a function takes one argument and recurses five times, you need a places for five arguments. Languages like COBOL put arguments in statically allocated memory instead of on a stack. Hence, COBOL lacks recursion.

Tail call optimization is an interesting technique that reduces the use of the stack. If function A calls function B, and then B's last step is to call function C, rather than having a return-to address for A and a return-to address for B, you can just have C return directly to A since there's nothing else in function B that needs to be done.

This works really well when a function calls itself as its last step. If F calls itself as its last step, rather than having a large stack of return-to addresses for F, you can optimize it away. In this way, what would normally be a recursive algorithm can be translated into simple interation. This is a critical technique in languages like Lisp that rely on recursion for looping; however, it's not present in languages like Python. (Guido van Rossum has argued that it's harder to debug bugs if parts of the stack have been optimized away. It's possible to end up in a function with no way of knowing how you got there.)

Stacklessness is a technique whereby the language's stack does not rely on an assembly or C-level stack, but rather exists on the heap. Because of the way C stacks require contiguous space, it's difficult to allocate space for 1000s of them at the same time. In a stackless interpreter, because stack frames are allocated dynamically from the heap, you can allocate a greater number of them stacks. This technique is used by stackless interpreters such as Lua as well as the greenlets library in Python in order to implement something like lightweight multithreading.

Some languages provide more flexibility than others in the way arguments are passed. C has the ability to pass a fixed number of arguments, but there's also a facility to pass an arbitrary number of arguments call varargs. Some languages like Perl and Ruby allow you to pass additional keyword arguments in the form of a hash to the function (e.g. "f(1, 2, :arg => 1)"). Some languages like Common Lisp and Python have advanced support for keyword arguments allowing you to call a method that looks like "def f(a, b, c=1, d=2, e=3)" like "f('a', 'b', e=3, d=3)". Notice that you can leave out or rearrange the order of keyword arguments. Python allows you to pass a variable number of positional arguments as well as a variable number of keyword arguments and it will even raise an exception if you pass a keyword argument that isn't expected. Common Lisp was one of the first languages to have extremely sophisticated parameter passing, including sophisticated support for keyword arguments.

Currying is a technique whereby if a function accepts two parameters, A and B, then what you really have is a function that takes an argument A and then returns another function that takes an argument B. In this way, all functions can be reduced to a list of functions that accept 0 or 1 arguments. When a language supports currying, you can pass a value for A and then later pass a value for B. Haskell supports currying.

Partial application is a related technique that lets you pass an arbitrary number of arguments, perhaps in a random order if keyword arguments are supported, and then actually call the function later. If you have a function that takes one parameter, then you can partially apply it to one argument, and you are left with a function that takes no parameters that you can call at a later time. Partial application is more general than currying. It's supported in Python. However, in languages like Python, Haskell, Lisp, etc., it's always possible to create a new function that takes no parameters that simply executes another function passing a particular value to that function. Hence, it's trivial to implement partial application manually.

Lazy evaluation is a technique whereby the arguments to a function are not evaluated until they are used within the function. If you call f(1/0) in Python, you will always get a DivisionByZero exception. However, in languages such as Haskell, if you call f (1/0), as long as f never evaluates its argument, you'll won't get an error. Languages that do not have lazy evaluation are said to be "strict". Hence, Haskell is not strict.

It is an error prone situation to have a language that is not strict (i.e. it supports lazy evaluation) but also supports side effects. For instance, consider writing the expression f(print(1), print(2) in a language that wasn't strict but permitted side effects. Depending on whether or not f evaluated its parameters, 1 and 2 may or may not be printed, and you couldn't be sure which order they would be printed in. Hence, functions in Haskell are both lazy and side-effect free.

Inlining is an optimization technique that helps avoid the overhead of a function call. If a function A calls a very small function B, an optimizing compiler might take a copy of the compiled code for B and put it "inline" in the compiled code for A. This reduces the function call overhead but results in larger binary sizes.

In some situations, some compilers can aggressively inline all functions. Alternatively, the compiler may be able to put all the code for all the functions side by side and simply jump between the different parts of the code without using the stack. Some programmers use this technique manually in order to write small, but highly efficient programs. In general, the way you think calling conventions work in a language may be a simplified, idealistic way of how they actually work. In the case of inlining, the function call completely disappears.

Continuation passing style is a technique whereby all function calls can be transformed in a way that makes tail call optimization possible. Consider the following function:
def sum(n):
if n == 0:
return 0
else:
return n + sum(n - 1)
This function cannot natively take advantage of tail call optimization because after it recurses, the return value must be added to n. It's easy to change this to a function that takes advantage of tail call optimization manually:
def sum(n, _accum=0):
if n == 0:
return _accum
else:
return sum(n - 1, _accum + n)
In this case, an accumulator is passed explicitly. (Remember, however, that even though I'm using Python syntax, Python does not support tail call optimization.)

Rather than passing an accumulator that contains a value, you can pass an accumulator that contains what other code needs to execute after the recursive call. Here is an example of that:
def sum(n, _callback=lambda x: x):
if n == 0:
return _callback(0)
else:
return sum(n - 1, lambda x: _callback(x + n))
By the way, if you find that code difficult to understand, don't feel bad! It took me more than half an hour to write it, despite the fact that I had seen the technique before! That goes to show you that switching to continuation passing style isn't easy. However, it can be accomplished via a mechanical (i.e. "just follow the steps") process, and there are languages that can do it for you automatically. Scala has a compiler flag that turns on continuation passing style so that the JVM behaves differently (conceptually it uses a lot of anonymous functions that exist on the heap rather than return-to addresses that exist on the stack).

Returning to C, you might think that the calling conventions of C were decided upon years ago, and the only variations would have to do with compiler optimizations. Surprisingly, that is not the case. For instance, AMD helped define the new calling conventions for System V on AMD64. In fact, the calling convention are neither C nor AMD64 dependent. It's entirely defined by the operating systems application binary interface standard. Hence, the calling conventions for AMD64 systems are slightly different under Windows (specifically in relation to what registers are used to pass variables). See here and here for more information.

So if you're thinking about implementing a new language, and you wonder what's in a subroutine, you make the call!

19 comments:

Rotund said...

Paragraph 3 actually has a fairly minor problem in that you did not describe the calling conventions of a language like C but instead a language like Pascal. In C, the caller is in charge of cleaning arguments off the stack. This is required to properly handle vargs.

Geoff said...

Excellent article! It was a lot of fun rapidly going over different ways to call.

AReallyGoodName said...

"AMD64 defined a new set of calling conventions for C"

This isn't entirely true. AMD helped define the new calling conventions for System V on AMD64, not for AMD64 or C in general.

In fact calling convention is neither C language or AMD64 hardware dependent. It's entirely defined by the Operating Systems application binary interface standard.

In fact, the calling conventions for AMD64 systems is slightly different under Windows. Specifically in relation to what registers are used to pass variables.
http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_Calling_Conventions

drew said...

This was really fun to read, thanks!

Patrick said...

JJ, nice overview of the topology of subroutines! It was a good way to start the day.

Hope you are well.

-Patrick

Anonymous said...

You're wrong about Objective-C and Smalltalk. Neither of these really has a first-class support for keyword arguments. A method whose call looks like "createSphereWithMass: 1.0 size: 2.0" doesn't have keyword arguments "mass" and "size". It's just an arity 2 method called "createSphereWithMass:size:".

It means that, for example, you can't shuffle arguments around (as it would be possible with first-class named arguments).

Shannon -jj Behrens said...

Rotund, thanks for the correction. I've updated the text.

Anonymous said...

Am I completely incorrect in saying:
Some C compilers allow you to choose the subroutine calling mechanism when writing the function. Thus allowing calls to C functions to be made from any language for which the subroutine calling mechanism is known in the form of a dll.

Shannon -jj Behrens said...

AReallyGoodName, thanks for the correction. I've updated the paragraph to include your text.

Shannon -jj Behrens said...

> You're wrong about Objective-C and Smalltalk. Neither of these really has a first-class support for keyword arguments. A method whose call looks like "createSphereWithMass: 1.0 size: 2.0" doesn't have keyword arguments "mass" and "size". It's just an arity 2 method called "createSphereWithMass:size:".

Thanks for the correction. I've updated the text.

Shannon -jj Behrens said...

> Am I completely incorrect in saying: Some C compilers allow you to choose the subroutine calling mechanism when writing the function. Thus allowing calls to C functions to be made from any language for which the subroutine calling mechanism is known in the form of a dll.

I don't know.

Anonymous said...

(Smalltalk/Obj-C)

Still a bit iffy! :D

Basically, on the declaration side you have a selector and parameter names (e.g. doStuffWithFoo: fooParam bar: barParam). Selector name is “public” — callers will have to use it to call the method, but parameter names are private to your code (just as argument names in most other languages). What you use on the caller side is just the selector; it doesn't even have to look like parameter names at all (you can have a method like "asdasd:dfdf:efef:").

Shannon -jj Behrens said...

> Still a bit iffy! :D

Can you suggest some replacement text?

Anonymous said...

> Can you suggest some replacement text?

Why even mention it at all? It doesn't really add any flexibility in passing arguments, just provides a better naming system.

Shannon -jj Behrens said...

> Why even mention it at all? It doesn't really add any flexibility in passing arguments, just provides a better naming system.

Okay, it's gone.

Anonymous said...

In Ruby, “named arguments” are already a hash. There is no conversion.

Shannon -jj Behrens said...

> In Ruby, “named arguments” are already a hash. There is no conversion.

I've clarified the text. Thanks.

dan aronson said...

JJ, that was the best article from you that I've read. Really informative and interesting.

Shannon -jj Behrens said...

> JJ, that was the best article from you that I've read. Really informative and interesting.

Thanks, Dan. I had it banging around in the back of my mind for a few years, and it took being unemployed for a couple months to finally find enough time to write it.

Happy Hacking!