Kevin's Tcl Optimization Note #2 Generating machine code that will tolerate recompilation [Draft 0: 2012-11-24] Kevin B, Kenny The previous note in this series discussed how type inference can be used to prepare hypothetical Tcl intermediate code for compilation to machine code that exploits native data types. In the example procedure in that note, there were several values that were expected to be of a given data type, and could be optimistically compiled to presume that data type. The note mentioned that recompilation would be necessary in the event of arithmetic overflow. This note discusses how the compiler might structure machine code to be prepared for such recompilation. REVIEW: The 'cos' procedure. We are using as a benchmark the 'cos' procedure, which evaluates the cosine function using a Maclaurin series: proc cos {x {n 15}} { set j 0 set s 1.0 set t 1.0 for {set i 1} {$i < $n} {incr i} { set t [expr {-$t * $x * $x / [incr j] / [incr j]}] set s [expr {$s + $t}] } return $s } By type inference, we have ascertained that 's' and 't' are always double, that 'x' is expected to be convertible to a number (which will be promoted to double in the code where 'x' is used), that 'n' is expected to be a native integer but may be anything, including a general string, ande that 'i' and 'j' are expected to be native integers but may overflow to entiers. (In practice, of course, none of the exception cases will arise: this not merely discusses what to do to prepare for the unlikely.) COMPILING TO MACHINE CODE Given the most optimistic assumptions about data types, the machine code for 'cos' would be something like the result of compiling the C code: double cos(int x, int n) // caller will provide the default value of 15 { int j = 0; double s = 1.0; double t = 1.0; for (i = 0; i < n; ++i) { t = -t * x * x / (++j) / (++j); s = s + t; } return s; } But we need to make the pessimistic assumptions that overflow checking and parameter type checking are needed. How close can we come to the desired result? COMPILING FOR RETRY One possible design for retry after a type exception is to structure all native procedures as follows: (1) All the procedures accept as input the interpreter and a 'display': a vector of all the values in the procedure that are live at any point when the procedure must be recompiled. They also accept an integer that gives the point at which the procedure should resume: the point at which a procedure with an earlier failed type assumption bailed out. In order to accommodate type changes, the most plausible type for the display entries is Tcl_Obj: after all, a Tcl_Obj can hold any Tcl value. The procedure must unpack its arguments, perform its calculations, and, if it must be interrupted for recompilation, save its state in the display, and return a code identifying the point at which it stopped (which also identifies which type assumption was violated. After recompilation, the code will be passed to the next procedure. If a procedure encounters an error, it returns -1. If it completes successfully, it returns 0. The same code, structured in this manner, would look like: /* * Values in the display. 'i_retval' is the return value of the * procedure. */ enum displayIndex { i_retval, i_x, i_n, i_i, i_j, i_s, i_t }; /* * The 'cos' function returns 0 if successful, -1 for an error, * and a positive number to indicate that a type assumption was * violated. The number indicates the offending position in the * code. * * 1 - The 'n' argument was not a native int. * (Recompile assuming entier, double, string according to its * observed type). * * 2 - Overflow occurred in the first [incr j] * 3 - Overflow occurred in the second [incr j]. * (Recompile assuming that j is an entier) * * 4 - Overflow occurred in [incr i] * (Recompile assuming that i is an entier) */ int compiled_cos(Tcl_Interp* interp, Tcl_Obj display[], int resuming) { /* NOTE THAT THE OBJECTS IN THE DISPLAY ARE ALLOCATED IN LINE: * we use display+retval, display+x, etc to address them. This * gives interesting memory management issues, but nothing that * can't be dealt with in the surrounding code. */ double x; int n; int i; int j; double s; double t; int itemp; int retcode; if (!resuming) goto start; /* * We're resuming from recompilation. * Unpack variables and go to the point where we left off. * Note that variables are left in the state they were in when * we suspended. Since this code is executed only once per recompilation, * it need not be fast. */ (void) Tcl_GetDoubleFromObj(NULL, display+i_x, &x); (void) Tcl_GetIntFromObj(NULL, display + i_n, &n); (void) Tcl_GetDoubleFromObj(NULL, display + i_s, &s); (void) Tcl_GetDoubleFromObj(NULL, display + i_t, &t); (void) Tcl_GetIntFromObj(NULL, display + i_i, &i); (void) Tcl_GetIntFromObj(NULL, display + i_j, &j); switch(resuming) { start: /* * Argument preparation */ /* * x is used only as an argument to multiplication by a double, * so its semantics will always be a double. The type inference * that I have discussed thus far doesn't actually describe how * to ascertain this, so the ice is still a trifle thin here. */ if (Tcl_GetDoubleFromObj(interp, display+i_x, &x) != TCL_OK) { return -1; } /* * n should be a native int, and not too large to represent. * I postulate a GetNativeInt that verifies this condition */ case 1: if (Tcl_GetNativeInt(NULL, display+i_n, &n) != TCL_OK) { /* * We'll pack uninitialized values into the display, * but that's harmless since we'll not use them when * we return. */ retcode = 1; goto repack_and_return; } j = 0; s = 1.0; t = 1.0; i = 1; goto looptest; loop: t = t * x * x; /* * Incrementing j could overflow, and the code must be recompiled * to make j an entier if it does. */ case 2: itemp = j + 1; if ((itemp ^ j) < 0 && (j ^ 1) >= 0) { retcode = 2; goto repack_and_return; } j = itemp; t = t / j; /* * Incrementing j could overflow, and the code must be recompiled * to make j an entier if it does. */ case 3: itemp = j + 1; if ((itemp ^ j) < 0 && (j ^ 1) >= 0) { retcode = 3; goto repack_and_return; } t = t / j; s = s + t; /* * Incrementing i could overflow, and the code must be recompiled * to make i an entier if it does. */ case 4: itemp = i + 1; if ((itemp ^ i) < 0 && (i ^ 1) >= 0) { retcode = 4; goto repack_and_return; } looptest: if (i < n) goto loop; Tcl_SetDoubleObj(display+i_retval, s); return 0; repack_and_return: Tcl_SetIntObj(display+i_i, i); Tcl_SetIntObj(display+i_j, j); Tcl_SetDoubleObj(display+i_s, s); Tcl_SetDoubleObj(display+i_t, t); return retcode; } } A careful examination will show that the procedure, with the checks removed, has the same overall structure as the C function above. It includes code that extracts values from the display on startup, and code that stores them and returns a code if assumptions are violated. On initial entry, the procedure will be entered with resuming=0, and in the ordinary case it will simply run through to exit in the 'looptest' block. If it encounters an unexpected data type, it will go to 'repack_and_return', save the state before the stumble, and return a code indicating where to resume. A recompiled procedure will have the same basic structure, but with data types changed. It will be re-entered with 'resuming' designating the point where things left off, and the values in the display containing the content of the local variables. The 'switch' statement will dispatch to the correct point, and execution will simply continue. SIDE NOTES The code above isn't as good as it could be: it was an attempt at hand compiling something similar to what a compiler might emit without a few extra optimizations. One important "peephole" optimization is in the overflow check. The test for overflow in c = a + b: if ((c ^ a) < 0 && (a ^ b) >= 0) { ... } (read: if c and a are of opposite sign, but a and b are of the same sign) can be simplified when b is a positive constant to if (c < 0 && a > 0) Another possible optimization is to preset the Tcl_Obj's with the appropriate object types, and work with the native values directly in the display, rather than copying to the activation record. Experimentation will be necessary with this one. It could well prove to be a false economy. It may be more expensive to access the display than the stack frame, or it may defeat code in the C compiler that places values into the processor registers. It would, of course, be possible to omit the resumption logic in the first version of a procedure that will be tried. It's shown here to give the fully general structure.