2Inside Intel Larrabee – CPU-GPU Convergence
Intel believes Larrabee is the incarnation of the merge between CPU and GPU (Graphical Processing Unit) performance. Pushed to the edge of performance, CPUs are adding cores and computing power. Motivated by higher-quality graphics and data-parallel programming, GPUs are being tapped to do more general computing tasks.
3Inside Intel Larrabee – Larrabee Key Differences from Typical GPUs
Each Larrabee core is a complete x86 core: context switching and pre-emptive multitasking; virtual memory and page swapping; fully coherent caches at all levels of the hierarchy. Efficient interblock communication: ring bus for full interprocessor communication; low-latency, high-bandwidth Level 1 and Level 2 caches; fast synchronization between cores and caches. Fixed function logic doesn’t get in the way: no back-end blender between cores and memory; no rasterization logic between vertex and pixel stages. Result: flexible load balancing and general functionality.
4Inside Intel Larrabee – Larrabee Processor Block Diagram
Cores communicate on a wide ring bus, which means fast access to memory and fixed function blocks; fast access for cache coherency. L2 cache is partitioned among the cores, which provides high aggregate bandwidth and allows data replication and sharing.
5Inside Intel Larrabee – Larrabee x86 Chip Block Diagram
Separate scalar and vector units with separate registers; in-order x86 scalar core; vector unit: 16 32-bit ops/clock; short execution pipelines; fast access from L1 cache; direct connection to each core’s subset of the L2 cache; prefetch instructions load L1 and L2 caches.
6Inside Intel Larrabee – Larrabee Vector Unit Block Diagram
Vector complete instruction set: scatter/gather for vector load/store; mask registers select lanes to write, which allows data-parallel flow control; this enables mapping a separate execution kernel to each VPU lane. Vector instructions support: fast read from L1 cache; numeric type conversion and data replication while reading from memory; rearrange the lanes on register read; fused multiply add (three arguments); Int32, Float32 and Float64 data.
7Inside Intel Larrabee – More Cores = More Scalability
Intel claims it sees greater game performance on games “F.E.A.R.,” “Half-Life 2: Episode Two” and “Gears of War” as it adds more cores to Larrabee.
8Inside Intel Larrabee – Transparency Example with Sorting
Sorting allows artists to distinguish details on foreground and background images. The small dragon was drawn first, so its wing appears behind the large dragon. The rendered pixels are sorted to show the larger dragon’s wing correctly.
9Inside Intel Larrabee – Transparency Example with Fog
The use of multiple layers allows artists to correctly render detail even in the background. Without multiple layers, the fog is not accurately applied to the small dragon seen through the wing. With multiple layers, the fog affects the small dragon correctly, even when seen through the wing.
10Inside Intel Larrabee – Shadows Using Irregular Z-Buffer
Shadows help us see spatial relationships.
11Inside Intel Larrabee – Shadow Map vs. Irregular Z-Buffer
Jagged edge artifacts are evident without the use of an irregular Z-buffer. With the irregular Z-buffer, clean edge details are rendered.
12Inside Intel Larrabee – See More Slide Shows Like This One
Getting Virtualization Right-