The definition of chip, processor, and die become somewhat clouded with Itanium. The first IA-64 "chip" is really a metal-cased cartridge, somewhat like Pentium II modules of yore. The cartridge - which is mechanically incompatible with anything ever seen before - contains at least five chips, including the processor itself and four cache SRAMs. The first- and second-level caches (L1 and L2) really are on the same die as the processor; the L3 cache takes up those four SRAMs that are off-chip but on-module. Got it?
Then theres the PAL. PAL is Intels "processor abstraction layer," a flash ROM inside the cartridge that, in Intels words "… maintain[s] a single software interface for multiple implementations of the processor silicon steppings." Sounds like a "fudge ROM" for hiding, tweaking, or patching imperfections in the processor that may not entirely live up to their data book specification.
The whole thing weighs in at about 325 million transistors: 25 million for the processor chip (including L1 and L2 caches) and about 75 million for each of the four L3 cache chips. Well toss in the PAL for free. If 25 million transistors seems like a lot, remember that Pentium III has 24 million and Pentium 4 has 42 million. For a high-end 64-bit processor, Itanium is looking positively dinky.
You know what else is big? Itaniums code footprint. Poor code density is a hallmark of VLIW designs, and although IA-64 makes some improvements as we mentioned, its no exception to the rule. With no (public) code to look at its hard to be sure, but educated estimates pin Itaniums code size at about one-third bigger than other 64-bit RISCs and double the size of Pentium binaries.
Poor code density means lots of disk space, but thats not a big deal for high-end systems. It also means less effective cache size, which in turn reduces cache-hit rates. Again, no big deal because caches can always be made bigger. But cache bandwidth is hard to improve and that may be the real bottleneck for IA-64 processors. Thats why Itaniums first two levels of cache are on the processor die itself and the L3 cache is very nearby on the same module.
Outside the Box
The 128-bit bus between the Itanium die and its L3 caches is contained entirely within the cartridge; its never exposed to the outside. Itaniums external bus is 64 bits wide and this is its only connection with the outside world, main memory, or other processors. Up to four processors can share this bus. After that, Intel has a bridge chip that allows four-processor clusters to talk to each other.
Its a pretty pedestrian bus as these things go. It has none of the exotic interprocessor communications that Hammer has (as well study in our next segment), nor is it even very fast at 2.1 GB/second of maximum bandwidth, compared with 3.2 GB/second for Pentium 4 or 3.6 GB/second for MIPS. Its also a doomed, dead-end bus: McKinley will have a completely different interface.