80-core ARM CPU to bring lower power, higher density to a rack near you

 

Promotional image of Ampere computer product.

ARM CPU vendor Ampere announced an 80-core CPU called the Altra on Tuesday. If the core count didn’t clue you in already, the Altra is aimed at data-center computing rather than home or even typical business needs. The Altra’s 80 cores do not offer hyperthreading, so 80 cores here means 80 threads as well.

Before we go into too much detail about the Altra—which is currently sampling but is not yet generally available and does not have any third-party benchmarks—it’s instructive to take a look slightly backward to its little sibling, the 32-core eMAG 8180.

Before Altra, there was (and is) eMAG

  • Running ApacheBench vs. Nginx gives us the closest thing to a “general-purpose” performance comparison. Ampere runs about half as fast as the competition here—but note the much narrower error-bar.

  • Redis is a key-value store, similar to memcached but more complex—and not natively multi-threaded. Multiple instances of redis are running simultaneously to produce “multi-threaded” results here.

  • Memcached is a key-value store, typically used to cache database query results (raw or post-processed), with an extremely memory-focused workload.

The Altra is not Ampere’s first entry into data-center ARM computing. Its last processor, the eMAG 8180, is a 32-core part running at up to 3.3GHz turbo. The eMAG 8180 is available in packet.net’s c2.large.arm package, in the form of Lenovo’s ThinkSystem HR330A 1u single-socket systems.

Kinvolk, a Berlin-based Linux development company, did some pretty extensive benchmarking of a single-socket eMAG 8180 system—comparing it to a 24-core AMD Epyc 7401P (24c/48t) and a dual-socket Xeon Gold 5120 (28c/56t total).

Kinvolk’s eMAG performance benchmarks are well worth a look, because so far there are no real-world (let alone third-party) benchmarks of the Altra. In Kinvolk’s testing, the eMAG 8180 excelled at memory I/O heavy artificial workloads, but it struggled in some compiler and network-intensive workloads. For the most part, it beat the dual-socket Intel Xeon system and held its own with the single-socket AMD Epyc system.

Benchmarks that don’t rely much on memory I/O—such as Nginx throughput measured by ApacheBench—are more of a challenge for the ARM-based eMAG, where its performance may drop to half that of its x86_64 competition. But it’s worth noting the much narrower error bars in most cases—true to Ampere’s claims, the eMAG’s non-SMT architecture is more consistent than AMD’s and Intel’s.

It’s also worth noting that, when we’re talking about full-on data-center builds—which is what Ampere’s designs are intended for—raw per-socket performance isn’t everything. The workloads in large data centers tend to massively scale horizontally, which makes density more important than raw performance per socket or per thread. The eMAG 8180 is a 125W TDP part, versus the Epyc 7401P’s 170W and the dual Xeon Gold 5120’s total 210W.

What we can expect from the 80-core Altra

  • Note that these are “projected” performance numbers—and the Epyc and Xeon have been artificially derated, to compensate for the Altra using GCC instead of a CPU optimized compiler.

  • Everything here is still both “projected” and artificially derated to “normalize” the AMD and Intel to a theoretical performance level with no CPU-optimized compiler.

  • The TDP per CPU core is lower for Altra than for Epyc or Xeon—meaning more cores per 12.5KW rack.

  • Rack density is the killer metric for very large data-center applications, and Altra claims it’ll lead the game on that metric.

  • The Total Cost of Ownership referenced here is generated by an Ampere-proprietary calculator, and we’re not entirely sure of its figures. Makes for a pretty column chart, though.

  • Always read the fine print. (1/3)

  • Always read the fine print. (2/3)

  • Always read the fine print. (3/3)

Like the eMAG, the Altra does not offer SMT (Simultaneous Multi Threading), so its 80 cores mean 80 threads. Unlike the eMAG, the Altra is designed for either single or dual-socket operation—so we can expect to see 160-core Altra-powered systems later in 2020. We know that there will be multiple SKUs, with a TDP range the data sheet specifies at 45W to 210W. But we don’t know their individual details.

The fine print in Altra’s slide deck claims 80 cores and 180W for the Altra under test, not 210W. This may imply adjustable thermal performance configurations similar to what one might see in laptop CPUs, but at this point it’s just too soon to tell. The company claims—presumably, with dual-socket builds of the 80-core SKU—the highest rack density in the industry, at up to 3,500 cores per rack.

It’s important to note that these performance numbers are thin at best. Not only are they Ampere-internal, they’re “projected,” not real. Further, the AMD and Intel performance numbers have been artificially decreased, to account for the fact that the Altra is using binaries compiled with GCC. Meanwhile, the AMD and Intel numbers were generated with CPU-optimized compilers. This drops the Epyc down to 83.5 percent of its real performance and the Xeon down to 76 percent of its.

This isn’t sketchy, exactly—it’s a fairly common industry practice, and Ampere disclosed it clearly enough in the presentation. But it’s likely not what many people would expect. We should point out that the only performance numbers given here are SPECrate 2017_int_base—which is an extremely narrow integer math performance benchmark.

We would be a lot more skeptical of these numbers if there weren’t far more comprehensive third-party benchmarks available for Ampere’s earlier eMAG 8180 ARM CPU. Thankfully, there are, and it seems reasonable not to expect major surprises in floating point performance—let alone multi-threaded memory I/O—given that earlier but similar chip’s independent evaluations.

Conclusions

It looks like Ampere’s Altra, which is currently sampling and expected to hit retail availability later in 2020, will get significant traction in some data centers. The platform offers notable benefits in terms of the data center’s cost to run them, with more cores and—typically—more performance both per watt and per rack.

With that said, we don’t expect the Altra—or any other ARM platform—to be the data-center darling of 2020 or even 2021. There’s plenty of platform inertia behind the x86_64 architecture that data-center operators will be loath to overcome. AMD’s Epyc in particular is close enough on Altra’s biggest selling point—power and rack density—that we don’t see many data centers deciding to throw away the frequently higher general-purpose performance as well as the comfort level of more traditional designs yet.

Original Article