The Good Stuff

By Jim Turley  |  Posted 2002-02-11 Print this article Print

: Instruction Set"> As we mentioned, IA-64 is an enhanced VLIW architecture, so its concept of "instruction" is a little different from that of, say, Pentium or Alpha. With IA-64, there are instructions, there are bundles, and there are groups. Get your notepads ready. Instructions are 41 bits long. Yup - say goodbye to powers of two. It takes 7 bits to specify one of 128 general-purpose (or floating-point) registers, so two source-operand fields and a destination field eat up 21 bits right there, before you even get to the opcode. Another 6 bits specify the 64 combinations of predication (which we discuss in more detail below), if any.
Instructions are delivered to the processor in "bundles." Bundles are 128 bits: three 41-bit instructions (making 123 bits), plus one 5-bit template, which well get to in a minute. Still with us? Then there are instruction groups, which are collections of instructions that can theoretically all execute all at once. The instruction groups are the compilers way of showing the processor which instructions can be dispatched simultaneously without dependencies or interlocks. Its the responsibility of the compiler to get this right; the processor doesnt check. Groups can be of any arbitrary length, from one lonely instruction up to millions of instructions that can (hypothetically, at least) all run at once without interfering with each other. A bit in the template identifies the end of a group.
A bundle is not a group. That is, IA-64 instructions are physically packaged into 128-bit bundles because thats deemed the minimum width for an IA-64 processors bus and decode circuitry. (Itanium dispatches two bundles, or 256 bits, at once.) A bundle just happens to hold three complete instructions. But logically, instructions can be grouped in any arbitrary amount, and its the groups that determine how instructions interrelate to one another. All IA-64 instructions fall into one of four categories: integer, load/store, floating-point, and branch operations. These categories are significant in how they map onto the chips hardware resources. Different IA-64 implementations (Itanium, McKinley, etc.) might have different hardware resources, but all will do their best to dispatch all the instructions in a group at once. And well see IA-64 compilers capable of optimizing binaries for different IA-64 processors too. Its hard not to think that Intels institutionalized taste for baroque and ungainly, not to mention bizarre instruction set features didnt creep in here somewhere. With so much elegance going for it, IA-64 falls down in the evening gown competition. First, IA-64 opcodes are not unique - theyre reused up to four times. In other words, the same 41-bit pattern decodes into four completely different and unrelated operations depending on whether its sent to the integer unit, the floating-point unit, the memory unit, or the branch unit. A C++ programmer would call this overloading. An assembly program would call it nuts. Youd think that Itaniums designers would have been satisfied with 241 different opcodes, but no… The second eccentric feature, which is related to the first, explains how Itanium avoids confusing these identical-but-different opcodes (a process serious engineers call disambiguation). The five-bit template at the start of every 128-bit bundle helps route the three-instruction payload to the correct execution units. Those of you who are good at binary arithmetic are thinking, "wait a minute… five bits isnt enough." And youd be right--if you werent designing Itanium. Rather than tagging each of the three instructions with its associated execution unit, or just extending the instruction width, IA-64 uses these five bits to define one of 24 different "templates" for an instruction bundle (the other eight combinations are reserved). A template spells out how the three instructions are arranged in a bundle, and where the end of the logical group is, if any. And yes, youre right again, 24 templates is not enough to define all possible combinations of integer, FP, branch, and memory operations within a bundle, as well as the presence of a groups logical stop. Deal with it. 24 defined templates Youll notice that its impossible to have an FP instruction as the first instruction of a bundle, and that load/store instructions are not allowed at the end. You cant have two FP instructions in a bundle, yet you can have three branch instructions bundled together. This is not as counterproductive as it sounds--as long as two of the branches are conditional and evaluate false, they do no harm other than wasting space.

Jim Turley is a semiconductor industry analyst, editor, and presenter working in Silicon Valley. Focus technologies are 32-bit microprocessors and semiconductor intellectual property (SIP).

Most recently Jim was the Senior Vice President of Strategy & Technology at ARC International plc (LSE:ARK), where he set the Company's strategic direction and guided its technical developments at five locations worldwide. With headquarters in London (UK) and development centers in New Hampshire, Canada, and California, ARC International is an innovative leader in the semiconductor IP (intellectual property) industry.

Previously, Jim was senior analyst for MicroDesign Resources (a unit of Cahners/Reed Elsevier) as well as the Senior Editor of the prestigious industry journal Microprocessor Report (a three-time winner of the Computer Press Award), and Editor-in-Chief of Embedded Processor Watch. He also hosted and directed the yearly Embedded Processor Forum conference, the industry's annual showcase for new microprocessors. As an analyst and editor, Turley consulting with leading semiconductor firms, providing informed advice on technology trends and market requirements, and was often called on to participate in new product reviews, strategy sessions, and technology development for large semiconductor companies.

Turley is the author of six popular books including Advanced 80386 Programming Techniques, the best-selling PCs Made Easy and others published by McGraw-Hill and Academic Press. He's served as technical editor for several of McGraw-Hill's computer and programming books. In addition, he was a regular technology columnist for Embedded System Programming, Computer Design, and Supermicro magazines, and contributed articles to dozens more. Earlier in his career, Turley held engineering or marketing positions at Adept Technology, Force Computers, TeleVideo, and other high-technology firms in Europe and the United States.

Turley has created and presented numerous seminars and training sessions around the world covering technology trends and the competitive microprocessor market. He is also a well-known speaker at industry events such as the Embedded Systems Conference and Microprocessor Forum, is frequently quoted in the Wall Street Journal, New York Times, USA Today, San Francisco Chronicle, and San Jose Mercury News, and has appeared frequently on television, radio, and Internet broadcasts. Jim volunteers for Recording for the Blind and recently earned his amateur auto-racing license. He has a talented and stunningly attractive wife, two overachieving children, an apparently brain-damaged dog, and an opossum living under the house.

Jim can be contacted at or by calling (408) 226-8086.

For additional information, visit


Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel