64-Bit CPUs: What You Need to Know - Page 7

: Instruction Set">

As we mentioned, IA-64 is an enhanced VLIW architecture, so its concept of "instruction" is a little different from that of, say, Pentium or Alpha. With IA-64, there are instructions, there are bundles, and there are groups. Get your notepads ready.

Instructions are 41 bits long. Yup - say goodbye to powers of two. It takes 7 bits to specify one of 128 general-purpose (or floating-point) registers, so two source-operand fields and a destination field eat up 21 bits right there, before you even get to the opcode. Another 6 bits specify the 64 combinations of predication (which we discuss in more detail below), if any.

Instructions are delivered to the processor in "bundles." Bundles are 128 bits: three 41-bit instructions (making 123 bits), plus one 5-bit template, which well get to in a minute. Still with us? Then there are instruction groups, which are collections of instructions that can theoretically all execute all at once. The instruction groups are the compilers way of showing the processor which instructions can be dispatched simultaneously without dependencies or interlocks. Its the responsibility of the compiler to get this right; the processor doesnt check. Groups can be of any arbitrary length, from one lonely instruction up to millions of instructions that can (hypothetically, at least) all run at once without interfering with each other. A bit in the template identifies the end of a group.

A bundle is not a group. That is, IA-64 instructions are physically packaged into 128-bit bundles because thats deemed the minimum width for an IA-64 processors bus and decode circuitry. (Itanium dispatches two bundles, or 256 bits, at once.) A bundle just happens to hold three complete instructions. But logically, instructions can be grouped in any arbitrary amount, and its the groups that determine how instructions interrelate to one another.

All IA-64 instructions fall into one of four categories: integer, load/store, floating-point, and branch operations. These categories are significant in how they map onto the chips hardware resources. Different IA-64 implementations (Itanium, McKinley, etc.) might have different hardware resources, but all will do their best to dispatch all the instructions in a group at once. And well see IA-64 compilers capable of optimizing binaries for different IA-64 processors too.

Its hard not to think that Intels institutionalized taste for baroque and ungainly, not to mention bizarre instruction set features didnt creep in here somewhere. With so much elegance going for it, IA-64 falls down in the evening gown competition. First, IA-64 opcodes are not unique - theyre reused up to four times. In other words, the same 41-bit pattern decodes into four completely different and unrelated operations depending on whether its sent to the integer unit, the floating-point unit, the memory unit, or the branch unit. A C++ programmer would call this overloading. An assembly program would call it nuts. Youd think that Itaniums designers would have been satisfied with 241 different opcodes, but no…

The second eccentric feature, which is related to the first, explains how Itanium avoids confusing these identical-but-different opcodes (a process serious engineers call disambiguation). The five-bit template at the start of every 128-bit bundle helps route the three-instruction payload to the correct execution units. Those of you who are good at binary arithmetic are thinking, "wait a minute… five bits isnt enough." And youd be right--if you werent designing Itanium. Rather than tagging each of the three instructions with its associated execution unit, or just extending the instruction width, IA-64 uses these five bits to define one of 24 different "templates" for an instruction bundle (the other eight combinations are reserved). A template spells out how the three instructions are arranged in a bundle, and where the end of the logical group is, if any. And yes, youre right again, 24 templates is not enough to define all possible combinations of integer, FP, branch, and memory operations within a bundle, as well as the presence of a groups logical stop. Deal with it.

24 defined templates

Youll notice that its impossible to have an FP instruction as the first instruction of a bundle, and that load/store instructions are not allowed at the end. You cant have two FP instructions in a bundle, yet you can have three branch instructions bundled together. This is not as counterproductive as it sounds--as long as two of the branches are conditional and evaluate false, they do no harm other than wasting space.