Back in June, Calxeda published web-serving benchmarks that claimed a significant advantage in performance per watt over x86-based servers. Using ApacheBench, a single 5.26 watt Calxeda EnergyCore server delivered 5500 transactions per second, compared to a 102 watt (TDP) Intel E3-1240 that saturated the network at 6950 TPS. About 2 months later, Intel spoke with Timothy Pricket Morgan at The Register to provide their response.
You have to hand it to Intel; they make really fast processors, which are appropriate when maximum compute performance is needed. But Intel’s argument is missing the point, the very reason why Extremely Efficient Servers are a promising trend: by right-sizing the compute, memory, and networking infrastructure to meet real workload requirements, one can save a great deal of money and power. Intel’s response is classic PC-Server era thinking: use a faster CPU, and then feed it like a force-fed goose being prepped for foie gras. In this case they added a 10G ethernet port to try to close the gap. But if 5000 transactions per second is all your website needs, or you use load balancing to handle the peak loads above normal usage, Calxeda is dramatically more efficient. That is the point.
It is a bit surprising Intel went to these lengths when Intel’s own math shows that Calxeda maintains a 4-5X performance/watt advantage versus the solution most websites would use. Apparently not satisfied, Intel then upped the ante and added an expensive 10 Gb network infrastructure to keep their uber-fast processor busy. With this configuration, Calxeda is still some 30% more efficient than the significantly more expensive* 10Gb Ivybridge solution. But small-medium web sites rarely use or need a 10Gb ethernet port; a 1Gb interface is usually sufficient for typical demand. Moreover, Intel’s proposed alternative would require two 10Gb top of rack switch (TORS) ports in addition to the 3 NICs (2 for data, 1 for management). Those TORS ports alone could add 10-15 watts per server for the 10Gb solution that were not included in Intel’s math. But hey, it won the benchmark (well, almost)!
Calxeda is focused on providing energy-efficient solutions for real-world problems and we believe that bigger and faster is not always better. Leaner and cleaner can be less expensive and far less power hungry, lowering costs for real-world workloads which can be highly variable. Which is more representative of your real real-world environment? You be the judge.
* Based on comparing the servers w/o disks to isolate the server-power, and adding 1 watt to each 5.26 watt Calxeda node to estimate wall power, assuming a modest 24 nodes in a chassis share the power supply and fans. Note that each Intel server equipped as Intel suggests would require a PCI extension with 10 Gb NICs, and switch ports; 2 for data and 1 for management. These are costly additions ($700 per 2 ports, plus the required 10Gb TORS ports) to the IvyBridge server, and of course consume even more power. We are still optimizing our platform and Calxeda will publish a slew of benchmarks and wall-power measurements in the coming weeks.
This is an interesting back-and-forth between Calxeda and Intel on Apache’s ab — watching benchmarksmanship is always interesting.
One thing that is clear in state-of-the-art data-centric applications is the crucial role of sophisticated local network interconnects, such as those between compute nodes in Big Data environments. This is somewhat in contrast with transactional http requests which track well with thin threading capabilities. One can certainly architect Big Data solutions by adding a plethora of 10GB ethernet cards to “traditional” PC-Server architectures, but the amount of network infrastructure required (simply due to the bus architectures) in traditional architectures adds tremendously to the cost, size, and power requirements of such solutions. It is not surprising how much discussion took place in The Register’s piece regarding network upgrading even in a web server environment.
Timothy Prickett Morgan at The Register calls for more traditional benchmarks such as SPEC2006, etc. However, currently, Hadoop and similar emerging Big Data environments are not helped much by higher CPU clocks, per se, but rather efficient on-die network interconnects.
Hi David! Nice to hear your “voice” again. Thanks for the thoughtful comments.
Where the LinkedIn and Facebook Likes links so I can promote this blog easier?