On top of the frames, theres register rotation, a feature that helps loop unrolling more than parameter passing. With rotation, Itanium can shift up to 96 of its general-purpose registers (the first 32 are still fixed and global) by one or more apparent positions. Why? So that iterative loops that hammer on the same register(s) time after time can all be dispatched and executed at once without stepping on each other. Each instance of the loop actually targets different physical registers, allowing them all to be in flight at once. If this sounds a lot like register renaming, it is. Itaniums register-rotation feature is less generic than all-purpose register renaming like Athlons, so its easier to implement and faster to execute. Chip-wide register renaming like Athlons adds gobs of multiplexers, adders, and routing, one of the big drawbacks of a massively out-of-order machine. On a smaller scale, ARM used this trick with its ill-fated Piccolo DSP coprocessor. At the high end, Cydrome also used this technique, a favorite feature that Cydrome alumnus and Itanium team member Bob Rau apparently brought with him.Frames and rotation help up to a point, but eventually even Itanium runs out of registers. When that happens, were back to pushing and popping registers on and off the stack. Where Itanium differs from SPARC is that Intel makes it automatic. Itaniums register save engine (RSE) is an automated circuit within the processor that oversees filling and spilling registers to/from the stack when the register file overflows or underflows. Unlike SPARC, Itaniums RSE handles this task automatically and invisibly to software. SPARC, in contrast, raises a fault that must be handled in software. The RSE is more complicated than you might think. It has to handle any kind of memory problem, page fault, exception, or error without bothering the processor. In Itanium, the RSE stalls the processor to do its work. In future IA-64 implementations, it will probably be more elegantly handled in the background.
So IA-64 has two levels of indirection for its own registers: the logical-to-virtual mapping of the frames and the virtual-to-physical mapping of the rotation. All this means that programs usually arent accessing the physical registers they think they are, but thats nothing new to high-end microprocessors. Arcane as it seems, this method still uses less hardware trickery than the full register renaming of Athlon, Pentium III, or P4.