This section requires a ton of expansion (actually the expansion will be my thesis....).
Tail recursion
We implement tail recursion using the "mini interpreter" technique widely documented in just about every paper about implementing functional languages (and probably invented by Guy Steele).
Everything that is callable from Haskell implements the Code interface
public interface Code {
public Code exec();
}
exec returns the next Code object to be executed, or null if we're returning for real. You make a tail call by simply returning the Code you want to call. To make a normal call (when scrutinizing an expression with a case statement) you need to keep calling exec() in a loop until it returns null (which means we returned for real).
Code c = what_i_want_to_call;
do { c = c.exec(); } while (c != null);
// now we're done with the call
Since we need to return a single value representing what to call next we can't pass arguments to the exec() function. You could box up the arguments into a new Code objects which then unpacks them and calls the real method with those arguments but that would require an allocation for every call. Instead we pass arguments in a bunch of static variables (declared in org.haskell.ghc.rts.R).
Things that suck
- We can't make direct branches. What should be a single direct jump instruction turns into an indirect jump back to the caller (who in running the interpreter loop), a conditional branch (
while (c != null), followed by a virtual method call (memory read, and another indirect jump). And this is the best case scenario (there might be other stuff going on in the jvm under the hood). This is a lot of overhead.
- Arguments have to be passed in global variables. Accessing global variables requires memory access. I don't think there is a JIT on the planet that is smart enough to figure out that it would be smart to tie a global variable to a register.
- Reference arguments are stored in an
Object field and require a checkcast on read. This isn't that big of a deal since we have to do a checkcast anyway when doing case analysis but it still hurts.
Thunks
After a thunk is evaluated it needs to be updated with the value it evaluated to. Every updatable thunk (represented by a java object) has an ind field. When an updatable thunk is entered it checks if the ind is non-null. If so it simply returns the ind field to the caller. If not the body of the thunk is executed and the ind field is set.
public abstract class Ind extends Closure {
public Closure ind;
}
public class MyThunk extends Ind {
public Closure freeVar;
public Code exec() {
if(ind != null) { R.o0 = ind; return null; }
// closure body
freeVar = null;
ind = o0;
return null;
}
}
Things that suck
- The useless thunk class always has to stick around. In the C RTS the garbage collector is smart enough to spot these updated thunks and replace references to them to their target. We're stuck with Java's GC which can't do this.
- Because these thunks stick around for so long we have to null out all reference fields when we update them with an indirection which hurts performance.