I’d like to point everyone over to a great review of the Calxeda-powered Boston Viridis box by Anandtech that just went live, here. First of all, big thanks to Johan De Gelas over at Anandtech and Wannes De Smet at SizingServers for doing a top notch job pulling together an in-depth review of our gear as well as the team at Boston Limited for taking care of the hardware. Since we launched the ECX-1000 we’ve been beating the streets to get real results and metrics out into customers’ hands and show that the technology delivers as promised. With quotes like “Calxeda really did it”, “nothing short of remarkable” and “revolutionary technology”, we’re all excited to see these results posted on a site like Anandtech.
Anandtech Reviews the Calxeda ECX-1000: “Calxeda’s ECX-1000 server node is revolutionary technology”
Written by Shawn Kaplan, General Manager – Financial Services, TELX
Advances in multi-core computing have allowed far greater compute densities such that nearly all datacenter racks run out of available power far sooner than physical space. Traditional High Performance Computing (HPC) X86 clusters can consume upwards of 400W per rack unit (U), this means that a typical data center rack with a 5KW – 8KW circuit can be maxed out in as little as 1/4 or 1/2 of the available space. Many of today’s forward thinking IT leaders are asking “Why can’t I have both extremely dense computing and better power efficiency?”
Remember how smoothly Apple transitioned from PowerPC chips to X86 back in the mid 2000′s? Customers hardly noticed that all their software “just worked” on a completely different ISA, thanks to some cool software built by “Transitive”, a small UK based company since gobbled up by IBM. Well, emulation doesn’t solve ALL the worlds problems, and critical applications will of course need to go native for maximum performance. But this approach can be very helpful with the CAO, or Computer Aided Other; the ancillary but important applications, tools, and utilities that are so pervasive in a datacenter.
Below is an excerpt from the EE Times article, ARM Gets Weapon in Server Battle Vs. Intel.
Russian engineers are developing software to run x86 programs on ARM-based servers. If successful, the software could help lower one of the biggest barriers ARM SoC makers face getting their chips adopted as alternatives to Intel x86 processors that dominate today’s server market.
Elbrus Technologies has developed emulation software that delivers 40 percent of current x86 performance. The company believes it could reach 80 percent native x86 performance or greater by the end of 2014. Analysts and ARM execs described the code as a significant, but limited option.
A growing list of companies–including Applied Micro, Calxeda, Cavium, Marvell, Nvidia and Samsung-aim to replace Intel CPUs with ARM SoCs that pack more functions and consume less power. One of their biggest hurdles is their chips do not support the wealth of server software that runs on the x86.
The Elbrus emulation code could help lower that barrier. The team will present a paper on its work at the ARM TechCon in Santa Clara, Calif., Oct. 30-Nov. 1.
The team’s software uses 1 Mbyte of memory. “What is more exciting is the fact that the memory footprint will have weak dependence on the number of applications that are being run in emulation mode,” Anatoly Konukhov, a member of the Elbrus team, said in an e-mail exchange.
The team has developed a binary translator that acts as an emulator, and plans to create an optimization process for it.
“Currently, we are creating a binary translator which allows us to run applications,” Konukhov said. “Implementation of an optimization process will start in parallel later this year–we’re expecting both parts be ready in the end of 2014.”
Work on the software started in 2010. Last summer, Elbrus got $1.3 million in funding from the Russian investment fund Skolkovo and MCST, a veteran Russian processor and software developer. MCST also is providing developers for the [Elbrus] project. Emulation is typically used when the new architecture has higher performance than the old one, which is not the case-at least today–moving from the x86 to ARM. “By the time this software is out in 2014 you could see chips using ARM’s V8, 64-bit architecture,” Krewell noted. “That said, you will lose some of the power efficiency of ARM when doing emulation,” Krewell said. “Once you lose 20 or more percent of efficiency, you put ARM on par with an x86,” he added. Emulation “isn’t the ideal approach for all situations,” said Ian Ferguson, director for server systems and ecosystem at ARM. “For example, I expect native apps to be the main solution for Web 2.0 companies that write their own code in high level languages, but in some areas of enterprise servers and embedded computing emulation might be interesting,” he said.
We spent a lot of time at various tradeshows around the world in June and the #1 question we were asked was “when can I get my hands on a Calxeda-based server?” I am happy to tell you the wait is over.
We have been working with Boston Limited in the UK, a highly respected solution provider, for about a year to bring an excellent Proof of Concept (POC) platform to market called “Viridis”. Boston currently has about 20 customers lined up for beta testing and a pipeline of hundreds of others interested in evaluating the platform. Boston is taking orders now from users in Europe, Asia and the US with shipments beginning later this month.
The Register published a great article today highlighting the features of the Boston Viridis platform:
Boston Viridis is a perfect option for those users who want to port their code, run benchmarks, and optimize their workloads for ARM. This highly configurable solution allows users to create their ideal initial testing environments with options ranging from 4 to 48 Calxeda EnergyCore server nodes in a 2U form factor.
We look forward to working with Boston and other systems providers to enable the market with Calxeda-based POCs. Stay tuned as we learn about success stories users experience with Calxeda EnergyCore-based solutions over the coming months.
The acronym “SoC” generally refers to “System on a Chip”. But with SoCs entering the server space, it is also taking on a new meaning: “Server on a Chip”. An SoC is a large scale integration of processor cores, memory controllers, on-chip and off-chip memories, peripheral controllers, accelerators, and custom IP (intellectual property) for specific applications and uses. As Moore’s law continues, chip process geometries shrink, allowing more transistors to reside on the same area of silicon. Traditionally, server processors have used this new real estate to add more cores. But there are better alternatives than just adding more cores for certain applications.
Increasing integration in an SoC brings a number of benefits including:
- Higher performance – significantly faster and wider internal busses compared to those found in a multi-chip or multi-board solution.
- Lower power – wider range of power optimization techniques can be employed in SoCs including power gating, changing bus speeds depending upon utilization, dynamic voltage and frequency scaling of processor cores and peripherals, multiple power domains, and a number of others. Additionally, having peripherals on chip avoids power hungry PHYs (analog drivers that need to drive signals between chips and boards).
- Higher density – fewer components to buy, consume power, and fail.
- Deeper integration of peripheral controllers and fabric interconnect technologies allow a number of advantages that cannot normally be achieved by having to go through standard bridges like PCIe.
Let’s stop and consider the components we typically will find in a standard rack-optimized volume server:
- One or two processor chips, often with integrated memory controllers.
- One or two chips for processor chipsets providing a range of functions like Southbridge peripherals and PCIe.
- A PCIe connected Ethernet NIC, either chip or PCIe board. In today’s volume servers, this is typically one or two 1 Gb Ethernet interfaces.
- A PCIe connected SATA controller, either chip or PCIe board.
- Controller chip for an SD card and/or USB.
- An extra cost, optional BMC (baseboard management controller) providing out of band system management control.
So, now with the availability of a purpose-built ARM® server SoC, how does this change? Everything in the laundry list above gets integrated onto a single, low power die. For example, let’s take a look at the Calxeda EnergyCore ECX-1000 series of SoCs. In each chip, we find:
- A quad-core Cortex A9 CPU, configured for server workloads.
- The largest L2 cache that you’ll find on an ARM server: 4 MB with ECC.
- A server class memory subsystem including a wide, high-performance 72-bit DDR3/3L memory controller, also including ECC.
- Integrated peripheral controllers that have direct DMA interfaces to the internal SoC busses without the PCIe overhead. Standard server peripheral controllers like multiple-lanes of SATA, multiple Ethernet controllers (both 1 Gb and 10 Gb), even an SD/eMMC controller for local boot or scratchpad storage, are all integrated on-chip.
- If your server needs to connect to devices that are not integrated, there are four dual-mode PCIe controllers, supporting both root-complex and target modes, in both x4 and x8 configurations.
- Instead of an optional (and expensive) BMC, management is built onto every chip, providing a sophisticated server management system that provides both in-band and out-of-band IPMI/DCMI system management interfaces along with dynamic power and fabric management.
- A deeply integrated, power and performance-optimized fabric interconnect, which we’ll talk about in a future blog entry.
- And all of this is designed with performance, power, and cost optimized servers in mind, delivering the industry leading performance/Watt and performance/Watt/$ servers.
With all the typical server components integrated onto a single chip, you can build a server by “just adding power and DRAM”. And even that is made easy for our customers with a card-level reference design of four EnergyCore SoCs, power regulators, DRAM, and fabric interconnect.
For the last several years, SoCs have been used in embedded systems and mobile devices for the same reasons and benefits discussed above. The server industry is now applying those same lessons learned to it’s own domain. No matter what the design looks like, a better integrated and power optimized Server-on-a-Chip is needed for the scale-out, cluster demands of our Internet generation.
When comparing fruit, everyone knows not to compare apples to, say, an orange or, god forbid, a cumquat. The same applies to chips. See this nice article, then come back and read on…
Nice job, DELL. Ditto Intel! Now, you might think, “oh wow! A 20 watt Intel Server! ARM’s lead certainly didn’t last long; Calxeda is toast! ” A sub-20 watt Xeon is indeed an accomplishment; Intel is a great company and knows what they are doing. But be careful when comparing our 3.8 (ok, call it 4) watt ECX-1000 to a Xeon. On the surface, we consume 1/5th the power. Not bad! But the story runs deeper than that. Let’s dissect the fruit and see what’s inside.
Xeon is not an SoC (more on that in another blog). It is a multi-core processor, like the Cortex A9 from ARM. It does have some integrated I/O (PCI-E 3.0 to be precise). But it does not have Ethernet, much less five 10Gigabit Ethernet ports. It does not have SATA controllers. It does not have an integrated BMC for processor management, much less fabric management and power optimization. All of these need to be added as additional components in the system BOM cost and power envelope to offer equivalent and necessary functionality to a Calxeda ECX-1000. Xeon does have more performance per thread; probably 3-5X, in fact, depending on the workload. But remember that ARM processors for servers are NOT about performance. If you need performance, buy Intel, or AMD, or IBM Power. But, it doesn’t matter how fast your thread or core can run if you are spending 90% of your time waiting for I/O. And that is exactly the problem people have with traditional architectures today in dealing with data-intensive computing such as Hadoop.
What really matters is the total power and cost of a CLUSTER for a particular workload. Not a processor, or even an SoC. A cluster of Calxeda server nodes will consume only 5 watts each, complete with DRAM memory. At 100%. At idle it only consumes .5 watts. (Oh, yeah, don’t forget about memory which can consume as much as 1Watt per Gigabyte in traditional servers!)
So, always be sure to check your fruit carefully!
Richard Fichera, of Forrester Research, was one of the 1st to see the potential of ARM in the datacenter. He takes note of today’s milestone:
This week, Calxeda is showing a live Calxeda cluster running Ubuntu 12.04 LTS on real EnergyCore hardware at the Ubuntu Developer and Cloud Summit events in Oakland, CA. This is not an FPGA demo. This is the real deal on real silicon; quad-core, w/ 4MB cache, secure management engine, and Calxeda’s fabric, all up and running.
Ubuntu 12.04, with support from Canonical, is the 1st Linux distribution with full support for ARM as a 1st tier server architecture. Incorporating OpenStack’s cloud management infrastructure, Ubuntu 12.04 is designed to support the world’s largest cloud environments, where Ubuntu enjoys commanding market share today.
After months of discussion, debate, claims, and counterclaims, the industry can now begin a fact-based dialog about Calxeda-based servers. What applications are appropriate? Are they fast enough? How much can they really save large internet and IT shops? Do they really consume only 5 watts each? In other words, this new category of technology is moving beyond Powerpoints and on to proof-points. Ok, we will still pepper the market with pretty presentations, but at least they will contain real benchmarks and measurements made on real systems. We will begin communicating benchmark results on calxeda.com soon.
So, back to Oakland…Running Ubuntu 12.04, we are demonstrating a standard LAMP stack (running Calxeda’s website) along with other popular web frameworks such as node.js and Ruby on Rails, provisioning of OpenStack Nova compute instances, and even Canonical’s Metal-as-a-Service bare-metal provisioning. The cluster we are running is a Calxeda EnergyCard prototype in a 2U chassis that supports up to 48 quad-core nodes at under 300 watts, with up to 24 SATA drives. For more information about UDS, please see http://uds.ubuntu.com/. Remote Participation for UDS is available at http://uds.ubuntu.com/community/remote-participation/.
While exciting to see, this demo really shows just how easy it is to move modern software over to Calxeda and Ubuntu. Literally, it all just worked. The code came up without any modifications. Just load and go.
The Linux community will see immediate benefits from such a server for building Linux kernels and distributions. A complete build of the Ubuntu 12.04 kernel took less than an hour to compile on a single node, 1/4 the time of current ARM build platforms. With a larger Calxeda cluster, a full build of the entire distro will take hours, instead of weeks.
Now that Calxeda EnergyCore has been seen in the wild, you can expect more sightings at a variety of industry events, and end-users shipments will begin over the next 4-8 weeks. Volume shipments are expected to begin early this Fall from HP and other system vendors. Be sure to check our website frequently to get updates.
Who said that hardware is boring? Let the fun, and games, begin!