El Capitan supercomputer: AMD flagship sinks Intel Aurora with 44,544 MI300A APUs

El Capitan : AMD flagship sinks Aurora with 44,544 MI300A APUs

El Capitan supercomputer: AMD flagship sinks Intel Aurora with 44,544 MI300A APUs

Image: HPE

The El Capitan supercomputer equipped with AMD Instinct MI300A offers 72% more performance than Intel’s Aurora, but consumes 9 MW less. The result is a system that is more than twice as efficient (GFLOP/watt), produced by a total of 11,039,616 “cores”, of which only about a million are CPU cores.

Table of Contents AMD’s flagship knocks out Intel Aurora with 44,544 APUs MI300A Right out of the box 1st place El Capitan has three offshoots Computing games the way millions Not only fast, but also efficient We are now optimizing further

1st place straight away

From start to finish, it is once again a masterful success that the partners of this project have achieved. And almost as expected, as those involved joked during Sunday’s press briefing. Ultimately, the biggest hurdles were approvals, not hardware. In the end, everyone is visibly proud, including AMD CEO Lisa Su.

HPE cabinets equipped with the already widely used HPE Slingshot supercomputer networking solution (version 11) were the first to be delivered a few months ago, meaning that the entire network could be configured before the installation of the first computer nodes (servers equipped with computing power). . were delivered. HPE is also on a roll lately: all of the top three supercomputers are produced by this manufacturer in a very similar configuration.

El Capitan for the Top500 revealed in November 2024 El Capitan for unveiling in November 2024 (Image: HPE)

El Capitan has three branches

Nevertheless, El Capitan is unique. But unique does not mean completely alone, because places 10, 20 and 49 of the new Top500 supercomputer from November 2024 are, so to speak, small offshoots of the large system with identical hardware, but on a lower scale.

Place 10 Tuolumne, for example, is an open system that will also be used for free research, while El Capitan will disappear behind closed doors in a few months to devote itself entirely to the American nuclear deterrent. The system was explicitly purpose-built and hosted at the Lawrence Livermore National Laboratory (LLNL) under the direction of the National Nuclear Security Administration (NNSA).

Million dollar style math games

11,136 nodes are now in use, with a total of 44,544 AMD Instinct MI300A Series APUs (details) installed – four APUs per node. The clock speed of the processor cores is comparatively very low, at 1.8 GHz.

Of the total 11,039,616 cores ranked in the Top500 list, 9,988,224 are GPUs. Based on the 228 CDNA3 CUs per APU specified by AMD, this results in 43,808 active APUs in the system, which offer 1,051,392 CPU cores, which mathematically matches the GPU cores and total number listed exactly. In total, the system also has over 5.4 petabytes of main memory. As we know, one APU offers 128 GB of HBM3, per dual node blade it is 1,024 GB – this is also the case for 43,808 active APUs.

Full details on AMD Instinct MI300A & MI300XAMD MI300A & MI300X: The new Instinct series is a milestone in many areas.

Ultimately, El Capitan has 1.742 ExaFLOPs (Rmax) out of a possible 2.746 ExaFLOPS, which is defined as the maximum value (Rpeak). This already represents 63 percent of the maximum power, which is hardly accessible anyway. The former number 1 Frontier, also equipped with AMD hardware, has reached 66% of its peak value, there are now 1,353 ExaFLOPs – at the start in 2022 it was 1.1 ExaFLOPS.

In terms of specs alone, Frontier now sits at around 70% of its calculated peak, El Capitan at around 62% when it debuted. This puts them both well ahead of number 3, Intel’s Aurora. This always amounts to 1.012 ExaFLOP, the maximum value is 1.98 ExaFLOP. The gap between maximum performance and practically usable performance remains significantly larger at Intel. To date, Intel’s original plan to once again deliver the world’s fastest supercomputer has been completely abandoned.

El Capitan for the Top500 revealed in November 2024 El Capitan for Top500 unveiling in November 2024 (Image: HPE)

Not only fast, but also efficient

The efficiency of AMD systems is not left aside, quite the contrary. The system also scores points with a relatively “low” power consumption of 29.58 megawatts. This means that the system consumes 5 MW more than Frontier, but is also slightly more efficient, so that in the end the GFLOP/Watt efficiency value is 58.89 for El Capitan and 54.98 for Frontier . Both are very well positioned, but they don’t quite come close to matching the efficiency of the smaller Instinct systems and Grace Hopper solutions beyond the 60 mark. The top two AMD systems are more than twice as efficient than Intel Aurora. This uses 38.69 megawatts for approximately one ExaFLOP and thus only achieves an efficiency value of 26.15. The operator explains that El Capitan ended up ranking quite high in price/performance, despite a high triple-digit price tag.

LLNL also plans to continue using top-notch supercomputers in the future. The next system will probably be an exascale solution, talking directly about zettascale is not helpful and is probably still too far away. But smaller systems are also being considered; For many applications, are simply better suited than a huge supercomputer.

Now we continue to optimize

El Capitan will be further optimized in the coming weeks and months, and there will likely also be a Linpack benchmark that may also be included in the Top500 rankings. This could result in even higher performance, after which the system would be put behind closed doors and fulfill its national security tasks.

Techconseil received information about this article from HPE and Top500 under NDA. The only requirement was the earliest possible publication date.

Topics: AMD graphics cards HPE Instinct processors supercomputers Economy

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top