Kevin's Tcl Optimization Note #2
Generating machine code that will tolerate recompilation
[Draft 0: 2012-11-24]
Kevin B, Kenny

The previous note in this series discussed how type inference can be
used to prepare hypothetical Tcl intermediate code for compilation to
machine code that exploits native data types. In the example procedure
in that note, there were several values that were expected to be of a
given data type, and could be optimistically compiled to presume that
data type. The note mentioned that recompilation would be necessary
in the event of arithmetic overflow. This note discusses how the
compiler might structure machine code to be prepared for such
recompilation.

REVIEW: The 'cos' procedure.

We are using as a benchmark the 'cos' procedure, which evaluates the
cosine function using a Maclaurin series:

    proc cos {x {n 15}} {
        set j 0
        set s 1.0
        set t 1.0
        for {set i 1} {$i < $n} {incr i} {
	    set t [expr {-$t * $x * $x / [incr j] / [incr j]}]
            set s [expr {$s + $t}]
        }
	return $s
    }

By type inference, we have ascertained that 's' and 't' are always
double, that 'x' is expected to be convertible to a number (which will
be promoted to double in the code where 'x' is used), that 'n' is
expected to be a native integer but may be anything, including a
general string, ande that 'i' and 'j' are expected to be native
integers but may overflow to entiers. (In practice, of course, none
of the exception cases will arise: this not merely discusses what to
do to prepare for the unlikely.)

COMPILING TO MACHINE CODE

Given the most optimistic assumptions about data types, the machine
code for 'cos' would be something like the result of compiling the C
code:

double
cos(int x, int n) // caller will provide the default value of 15
{
    int j = 0;
    double s = 1.0;
    double t = 1.0;
    for (i = 0; i < n; ++i) {
        t = -t * x * x / (++j) / (++j);
	s = s + t;
    }
    return s;
}

But we need to make the pessimistic assumptions that overflow checking
and parameter type checking are needed.  How close can we come to the
desired result?

COMPILING FOR RETRY

One possible design for retry after a type exception is to structure
all native procedures as follows:

(1) All the procedures accept as input the interpreter and a
'display': a vector of all the values in the procedure that are live
at any point when the procedure must be recompiled. They also accept
an integer that gives the point at which the procedure should resume:
the point at which a procedure with an earlier failed type assumption
bailed out.

In order to accommodate type changes, the most plausible type for the
display entries is Tcl_Obj: after all, a Tcl_Obj can hold any Tcl
value.  The procedure must unpack its arguments, perform its
calculations, and, if it must be interrupted for recompilation, save
its state in the display, and return a code identifying the point at
which it stopped (which also identifies which type assumption was
violated. After recompilation, the code will be passed to the next
procedure.

If a procedure encounters an error, it returns -1. If it completes
successfully, it returns 0.

The same code, structured in this manner, would look like:

/*
 * Values in the display.  'i_retval' is the return value of the
 * procedure.
 */

enum displayIndex { i_retval, i_x, i_n, i_i, i_j, i_s, i_t };

/*
 * The 'cos' function returns 0 if successful, -1 for an error,
 * and a positive number to indicate that a type assumption was
 * violated. The number indicates the offending position in the
 * code.
 *
 * 1 - The 'n' argument was not a native int.
 *     (Recompile assuming entier, double, string according to its
 *      observed type).
 *
 * 2 - Overflow occurred in the first [incr j]
 * 3 - Overflow occurred in the second [incr j].
 *     (Recompile assuming that j is an entier)
 *
 * 4 - Overflow occurred in [incr i]
 *     (Recompile assuming that i is an entier)
 */
int
compiled_cos(Tcl_Interp* interp, Tcl_Obj display[], int resuming)
{
    /* NOTE THAT THE OBJECTS IN THE DISPLAY ARE ALLOCATED IN LINE:
     * we use display+retval, display+x, etc to address them. This
     * gives interesting memory management issues, but nothing that
     * can't be dealt with in the surrounding code. 
     */
    double x;
    int n;
    int i;
    int j;
    double s;
    double t;
    int itemp;
    int retcode;

    if (!resuming) goto start;

    /*
     * We're resuming from recompilation.
     * Unpack variables and go to the point where we left off.
     * Note that variables are left in the state they were in when
     * we suspended. Since this code is executed only once per recompilation,
     * it need not be fast.
     */

    (void) Tcl_GetDoubleFromObj(NULL, display+i_x, &x);
    (void) Tcl_GetIntFromObj(NULL, display + i_n, &n);
    (void) Tcl_GetDoubleFromObj(NULL, display + i_s, &s);
    (void) Tcl_GetDoubleFromObj(NULL, display + i_t, &t);
    (void) Tcl_GetIntFromObj(NULL, display + i_i, &i);
    (void) Tcl_GetIntFromObj(NULL, display + i_j, &j);
    switch(resuming) {

    start:

	/*
	 * Argument preparation
	 */
	
	/*
	 * x is used only as an argument to multiplication by a double,
	 * so its semantics will always be a double.  The type inference
	 * that I have discussed thus far doesn't actually describe how
	 * to ascertain this, so the ice is still a trifle thin here.
	 */
	if (Tcl_GetDoubleFromObj(interp, display+i_x, &x) != TCL_OK) {
	    return -1;
	}

	/*
	 * n should be a native int, and not too large to represent.
	 * I postulate a GetNativeInt that verifies this condition
	 */
    case 1:
	if (Tcl_GetNativeInt(NULL, display+i_n, &n) != TCL_OK) {
	    /* 
	     * We'll pack uninitialized values into the display,
	     * but that's harmless since we'll not use them when
	     * we return.
	     */
	    retcode = 1;
	    goto repack_and_return;
	}

	j = 0;
	s = 1.0;
	t = 1.0;
	i = 1;
	goto looptest;

    loop:
	t = t * x * x;

	/* 
	 * Incrementing j could overflow, and the code must be recompiled
	 * to make j an entier if it does.
	 */
    case 2:
	itemp = j + 1;
	if ((itemp ^ j) < 0 && (j ^ 1) >= 0) {
	    retcode = 2;
	    goto repack_and_return;
	}
	j = itemp;

	t = t / j;

	/*
	 * Incrementing j could overflow, and the code must be recompiled
	 * to make j an entier if it does.
	 */
    case 3:
	itemp = j + 1;
	if ((itemp ^ j) < 0 && (j ^ 1) >= 0) {
	    retcode = 3;
	    goto repack_and_return;
	}

	t = t / j;
	s = s + t;

	/*
	 * Incrementing i could overflow, and the code must be recompiled
	 * to make i an entier if it does.
	 */
     case 4:
	itemp = i + 1;
	if ((itemp ^ i) < 0 && (i ^ 1) >= 0) {
	    retcode = 4;
	    goto repack_and_return;
	}

    looptest:
	if (i < n) goto loop;

	Tcl_SetDoubleObj(display+i_retval, s);
	return 0;

    repack_and_return:
	Tcl_SetIntObj(display+i_i, i);
	Tcl_SetIntObj(display+i_j, j);
	Tcl_SetDoubleObj(display+i_s, s);
	Tcl_SetDoubleObj(display+i_t, t);
	return retcode;
    }

}

A careful examination will show that the procedure, with the checks
removed, has the same overall structure as the C function above.
It includes code that extracts values from the display on startup,
and code that stores them and returns a code if assumptions are
violated.

On initial entry, the procedure will be entered with resuming=0, and
in the ordinary case it will simply run through to exit in the
'looptest' block.  If it encounters an unexpected data type, it will
go to 'repack_and_return', save the state before the stumble, and
return a code indicating where to resume.

A recompiled procedure will have the same basic structure, but with
data types changed. It will be re-entered with 'resuming' designating
the point where things left off, and the values in the display
containing the content of the local variables. The 'switch' statement
will dispatch to the correct point, and execution will simply
continue.

SIDE NOTES

The code above isn't as good as it could be: it was an attempt at
hand compiling something similar to what a compiler might emit without
a few extra optimizations.  One important "peephole" optimization is
in the overflow check.  The test for overflow in c = a + b:

    if ((c ^ a) < 0 && (a ^ b) >= 0) { ... }

(read: if c and a are of opposite sign, but a and b are of the same
sign) can be simplified when b is a positive constant to

    if (c < 0 && a > 0)

Another possible optimization is to preset the Tcl_Obj's with the
appropriate object types, and work with the native values directly in
the display, rather than copying to the activation record.
Experimentation will be necessary with this one. It could well prove
to be a false economy. It may be more expensive to access the display
than the stack frame, or it may defeat code in the C compiler that
places values into the processor registers.

It would, of course, be possible to omit the resumption logic in
the first version of a procedure that will be tried. It's shown here
to give the fully general structure.