Intel Architecture Day reveals new Architectures, HPC-AI and Client Computing
At Intel Architecture Day 2021, Raja Koduri, senior vice president and general manager of the Accelerated Computing Systems and Graphics Group at Intel Corporation outlined the news about Intel’s new architectures for two x86 CPU cores, two data center SoCs, two discrete GPUs and a revolutionary multicore performance hybrid architecture for client.
Architecture is alchemy of hardware and software. It blends the best transistors for a given engine, connects them through advanced packaging, integrates high-bandwidth, low-power caches, and equips them with high-capacity, high-bandwidth memories and low-latency scalable interconnects for hybrid computing clusters in a package, while also ensuring that all software accelerates seamlessly. Disclosing the architectural innovations that they have been working on for products that are imminent is something Intel’s architects look forward to each year at Intel Architecture Day, and this year’s event was the most exciting yet.
Intel unveiled our biggest shifts in Intel® architectures in a generation. This includes the first in-depth look at Alder Lake, the first performance hybrid architecture with two new generations of x86 cores and the intelligent Intel® Thread Director workload scheduler; Sapphire Rapids, Intel’s new standard-setting data center architecture with our new Performance-core and various accelerator engines; the new discrete gaming graphics processing unit (GPU) architecture; new infrastructure processing units (IPUs); and Ponte Vecchio, our tour-de-force data center GPU architecture with Intel’s highest ever compute density.
These architectural breakthroughs set the stage for the next era of leadership products, starting soon with Alder Lake. The breakthroughs disclosed today also demonstrate how architecture will satisfy the crushing demand for more compute performance as workloads from the desktop to the data center become larger, more complex and more diverse than ever.
A highly scalable x86 microarchitecture for addressing compute requirements across the entire spectrum of our customers’ needs, from low-power mobile applications to many-core microservices.
Compared with Skylake, Intel’s most prolific CPU microarchitecture, the Efficient-core delivers 40% more single-threaded performance at the same power, or the same performance while consuming less than 40% of the power.
For throughput performance, four Efficient-cores deliver 80% more performance while still consuming less power than two Skylake cores running four threads or the same throughput performance while consuming 80% less power.
This x86 core is not only the highest performing CPU core Intel has ever built, but it also delivers a step function in CPU architecture performance that will drive the next decade of compute.
It was designed as a wider, deeper and smarter architecture to expose more parallelism, increase execution parallelism, reduce latency and increase general purpose performance. It also helps support large data and large code footprint applications.
Performance-core provides a Geomean improvement of about 19%, across a wide range of workloads over our current 11th Gen Intel® Core™ architecture (Cypress Cove core) at the same frequency.
Targeted for data center processors and for the evolving trends in machine learning, Performance-core brings dedicated hardware, including Intel’s new Advanced Matrix Extensions (AMX), to perform matrix multiplication operations for an order of magnitude performance – a nearly 8x increase in artificial intelligence acceleration.1 This is architected for software ease of use, leveraging the x86 programming model.
Intel Thread Director
Intel’s unique approach to scheduling was developed to ensure Efficient-cores and Performance-cores work seamlessly together, dynamically and intelligently assigning workloads from the start and optimizing the system for maximum real-world performance and efficiency.
With intelligence built directly into the core, Intel Thread Director works seamlessly with the operating system to place the right thread on the right core at the right time.
Reinventing the multicore architecture, Alder Lake will be Intel’s first performance hybrid architecture with the new Intel Thread Director.
This is Intel’s most intelligent client system-on-chip (SoC) architecture, featuring a combination of Efficient-cores and Performance-cores, scaling from ultra-mobile to desktop, and leading the industry transition with multiple industry leading I/O and memory.
Products based on Alder Lake will begin shipping this year.
Xe HPG and Alchemist SoC
A new discrete graphics microarchitecture is designed to scale to enthusiast-class performance for gaming and creation workloads.
The Xe HPG microarchitecture features a new Xe-core, a compute-focused programmable and scalable element, and full support for DirectX 12 Ultimate. New matrix engines inside the Xe-cores (referred to as Xe Matrix eXtensions, XMX) accelerate artificial intelligence workloads such as XeSS, a novel upscaling technology that enables high-performance and high-fidelity gaming.
Xe HPG-based Alchemist SoCs (formerly code-named DG2) will be coming to market in the first quarter of 2022 under the new Intel® Arc™ brand.
Combining Intel’s Performance-cores with new accelerator engines, Sapphire Rapids sets the standard for next-generation data center processors.
At the heart of Sapphire Rapids is a tiled, modular SoC architecture that delivers significant scalability while still maintaining the benefits of a monolithic CPU interface thanks to Intel’s EMIB multi-die interconnect packaging technology and advanced mesh architecture.
Infrastructure Processing Unit
Mount Evans is Intel’s first dedicated ASIC-based IPU, along with a new FPGA-based IPU reference platform, Oak Springs Canyon.
With an Intel IPU-based architecture, cloud service providers (CSPs) can maximize data center revenue by offloading infrastructure tasks from CPUs to IPUs. Offloading infrastructure tasks to the IPU allows CSPs to rent 100 percent of their server CPUs to customers.
Xe HPC, Ponte Vecchio
The most complex SoC Intel has ever built and a great example of our IDM 2.0 strategy come to life, Ponte Vecchio takes advantage of several advanced semiconductor processes, our revolutionary EMIB technology, and our Foveros 3D packaging. With this product, we are bringing to life our moon-shot project, the 100 billion-transistor device that delivers industry-leading FLOPs and compute density to accelerate artificial intelligence, high performance computing and advanced analytics workloads.
At Architecture Day, Intel showed that their early Ponte Vecchio silicon is already demonstrating leadership performance, setting an industry-record in both inference and training throughput on a popular AI benchmark.
A0 silicon is already providing greater than 45 TFLOPS FP32 throughput, greater than 5 TBps Memory Fabric bandwidth and greater than 2 TBps connectivity bandwidth. Ponte Vecchio, as with Xe architectures, will be enabled by oneAPI, open standards-based, cross-architecture and cross-vendor unified software stack.
Looking ahead, Intel faces a massive demand for compute – potentially a 1,000x need by 2025. That 1,000-times boost in four years is Moore’s Law to the power of five.
As CEO, Pat Gelsinger, also an architect, stated at Architecture Day: “We face daunting compute challenges that can only be solved through revolutionary architectures and platforms … Our talented architects and engineers made possible all this technology magic.”
The world is counting on architects and engineers to solve the most difficult computational problems, to enrich people’s lives. Our strategy and execution are accelerating to meet these demands – at a torrid pace.