Apache Benchmarks for Calxeda’s 5-Watt Web Server

It’s the middle of June, which means we’re smack in the middle of tradeshow and conference season for the IT industry. We were at Computex in Taipei two weeks ago, and this week we’re participating in International Supercomputing in Hamburg, and GigaOM’s Structure conference in San Francisco. In fact, our CEO, Barry Evans, is on a panel to discuss fabric technologies and their role in the evolution of datacenters. Should be a good one!

In spite of the hectic season, it hasn’t stopped us from moving forward with what everyone is really waiting for: benchmarks!  Well, I’m happy to be able to share some preliminary results of both performance and power consumption for those of you looking for more efficient web servers.

The Setup

For all your benchmark junkies out there, here are some of the details of our web server benchmark rig.

  • Hardware:
    • Single Calxeda EnergyCore ECX-1000 @ 1.1 GHz
    • 4 GB of DDR3L-1066 memory
    • One 1Gb Ethernet network port
    • One 250 GB SATA 7200rpm HDD
  • Software:
    • Ubuntu Server v12.04 (3.2 kernel)
    • Apache Server v2.4.2
    • ApacheBench v2.3 (16k request size)

Power measurements were obtained at a 2-second sampling rate and averaged across the duration of the benchmark run. Power supply overhead and hard drive power consumption were not included in these measurements, however the entire SoC and DDR memory are included together.

The Results

Below are the results from our ApacheBench tests. The table illustrates both performance (requests per second) and energy consumption (Watts) along with an Intel Xeon-based platform for comparison.

There are a few details and interesting points I’d like to point out regarding the table above:

  1. At full 100% CPU utilization across all 4-cores, the EnergyCore server is able to handle ~5500 requests per second, while only consuming a little over 5 watts.
  2. The power consumed by the EnergyCore processor is actual measured power directly against real hardware. The Intel (Sandybridge) platform is based on published TDP values for the CPU and I/O chipset, along with an estimate for DDR memory. Unfortunately, at the time of this blog post, we didn’t have a way to measure actual power consumption with the same level of fine detail.
  3. Even if we were to reduce the Intel Xeon’s TDP numbers by 30%, the EnergyCore solution would still provide a significant performance per watt advantage (greater than 10X).
  4. The Sandybridge system saturated the single 1Gb NIC with less than 15% CPU utilization. (While it’s possible to add additional network adapters, most data center customers we’ve spoken with don’t add extra NICs for web servers for a variety of reasons.) This is a classic example of where Calxeda can deliver superior value: workloads for which “brawny cores” simply deliver more horsepower than can be consumed by the rest of the platform/infrastructure.
  5. We still have some fine tuning to do to bring down the power consumption. But the beautiful thing is we’re now only trying to squeeze out the last few hundred milli-Watts.

The Five Watt Web Server

By itself, the advantages and energy savings may not seem obvious and a bit hard to grasp. But if you start to look at larger, hyper-scale datacenters, you quickly realize the impact of this new technology. At our product launch last November, together with HP, we shook the industry up by estimating a 63% lower 3-year TCO for the Redstone platform. Based on our final performance and existing power measurements, I’m happy to report that we will easily meet those earlier projections. In fact, while still preliminary, our models now indicate that our TCO advantage improves to a 77% reduction of overall total cost of ownership – a significant value for any sized data center.

This is just the first of many benchmarks to come. We are actively looking at characterizing various additional workloads (like Cassandra, Hadoop, Memcached, and Graph500), and if these initial numbers are any indication of what’s to come, the industry is surely in for an fun ride!

Comments

  1. Interesting, would you like to try monkey to check the benchmark and power consumption ?

    http://monkey-project.com/benchmarks/raspberry_pi_monkey_nginx

  2. Kevin Croft says:

    John, these results are impressive, however I noticed a couple points that that hopefully can be improved upon in a second round.

    First, a more representative overall system power usage should be used instead of the maximum CPU TDP given the system was only 15% loaded. Even the hypothetical 30% reduction you offered isn’t sufficient.

    Please see: http://www.dell.com/downloads/global/products/pedge/en/R210-II_1P_E3-1240_250W_Energy_Star_Data_Sheet.pdf for wall-power-usage of an actual E3-1240 Dell server with 2 hard disks idling at 43 watts at the plug. In this benchmark, we know the E3-1240 ran at 15% CPU usage with only one hard disk, so my guess it it probably also consume around 43 watts.

    Second is the fact that the E3-1240 was saddled with 16GB of memory instead of a apples-to-apples 4GB.. You played fair on hard disks, so please do so for memory.

    Third, this comparison was actually a gigabit-per-watt comparison, because the test was not CPU bound. If you really want to compare web-content performance/watt between ARM and Intel CPUs, then full load all 4 cores by serving CPU-bound dynamic content, and not static files.

    My recommendation for a Round Two – if you wish to show it:

    1) Serve dynamic content that fully saturates the CPUS (all cores & threads)
    2) Measure power at the wall for both systems, not hypotheticals or spec’d maximums.
    3) Install the same amount of memory in both systems, like you’ve done for hard disks
    4) Use the same version of Linux kernel, apache & PHP (if possible)
    5) Customizations to maximize performance are fine, so long as similar efforts are put into both systems and they are disclosed (ie: CLFAGS, apache.conf, php.ini, etc). Otherwise, stick to defaults and mention it.

    Regards,
    Kevin

    • John Mao says:

      Kevin, thanks for the constructive feedback. A lot of points you make are very valid. Let me try to provide some additional commentary.

      1) The point of this was to provide an early preview to the types of Performance/Watt numbers you could expect from an EnergyCore SOC.

      2) We did the performance testing on an Intel system in-house. While we can measure the wall power, it wouldn’t be a fair comparison to how we are measuring a single EnergyCore SOC. (It would obviously skew poorly for the Intel system.) Also, if you look at the types of systems being designed around ARM-based SOCs (see the HP Redstone Platform), they are all extremely dense, which further makes the wall-power comparisons even more difficult.

      3) The point we are trying to communicate, however, is the efficiency of the Calxeda EnergyCore processor is multiple orders of magnitude better than the comparison. Feel free to verify this yourself (it’s pretty simple math), but even if you were to cut our Intel system’s power estimates to 43W (which is as your link shows is the idle power for the R210), our Performance Per Watt advantage would still be nearly 5X.

      4) Point well taken on the CPU-bound and equivalent memory constraints. We will take that into account for our next round of benchmarks.

      5) Both systems were using the same exact Linux kernel, Apache, and PHP versions, with default settings.

      Probably not the “re-run everything right now” answer you were looking for, but hope it provides some insight into our thought process.

      • Kevin Croft says:

        Thanks for the response John,

        1) The test certainly showed the Performance/Watt numbers for the EnergyCore SOC, however the representation of Performance/watt of the Intel system is where the test can be improved, because as it stands the test measures network throughput-per-watt and not a CPU-performance-per-watt.

        2) Thanks for the explanation. I understand that typically you’d have hundreds of these SOC’s in a single rack fed by a common backplane (power/network/IO) where you gain efficiencies of scale. In that case, you could prorate the PS overhead for a single SOC, perhaps one or two extra watts, then you can add in the power of the hard disk (say 6 watts).

        So in total, the SOC is maybe around 12 watts including the hard disk.

        3) This would let you take the actual wall reading from the E3-1240, which you’re probably seeing in the 40-watt range.

        4) You mentioned the efficiency of the Calxeda EnergyCore processor is “multiple orders of magnitude better” than the compared Intel system. I think the most important point is that this benchmark didn’t measure the full efficiency of the Intel system — so the test is flawed, and any measure of comparison should not be drawn. (A minor point: order’s of magnitude are powers of 10; so one order of magnitude is 10x, two orders of magnitude is 100x).

        5) My hunch, and the challenge, is to run a truly CPU-bound benchmark, using all threads/cores on both the EnergyCore SOC and E3-1240, and measure their full power at the plug (or the prorated amount from the backplane). If you don’t want hard-disk power worsening the power measure of the EnergyCore, then run both systems from USB stick OS or NFS server.

        5) Both systems were using the same exact Linux kernel, Apache, and PHP versions, with default settings. — Excellent.

        I appreciate the test that you have done. It shows you can get excellent efficiency in an overall low power package. The big question is: can the bigger, less-efficient-when-idle system dish our more performance per watt when spun up to its maximum potential?

        Answering this is crucial for installations where the machines are running 24/7 at full capacity.

      • Andreas says:

        Another suggestion.
        It would be interesting to see you using ssd too. 7ms for Intel: looks like an hdd access time is a bottleneck here.

  3. I wish someone would produce a reasonably priced Mini-ITX, or smaller, serious ARM server for those who don’t want to wait for HP or Dell to distribute it down beyond blades.

    • We have made some Intel E31260L desktop mini-itx servers that simply scream. It is unfortunate that this test is really not Apples to Apples. Our boxes are super efficient and powerful with SSD’s and it is hard for me to believe an ARM can beat our boxes. I run multiple virtual machines daily for demos and have blasted a single CriKit MicroServer with 21,000 constant web users and the CPU is minimally busy ,even when one of the Gigabit NICS is saturated, and I can tell you the overall system uses significantly less than its 100 Watt power supply provides. As mentioned above, tests should max out the system and be measured at the wall. Anything else is kinda lame and obviously biased.

      • Karl Freund says:

        Hi PJ,
        If the CPU is minimally busy, then you are wasting a lot of power. Thats the point of the benchmark, in spite of its admitted shortcomings. A Xeon server AT IDLE consumes a lot of power; at full bore we consume 5 watts. Now, if power isn’t an issue, then ARM is probably not interesting. But for large-scale environments, Power is THE issue.

        Let me know if USMicro would be interested in exploring further. We are always looking for new partners!

        Karl

    • Karl Freund says:

      Bob, we would love to help the development community with such a product. We have a design on the drawing boards, just need to get the cost down, and find the right partner who would pick it up and make it available. Do you think people would want this to be a single node, or a small (2-4 node) cluster?

  4. The world would still be very happy and excited if you had actually done some more accurate measurement with the Intel System. Using a more Power Efficient 40W TDP Xeon and only 4GB RAM, the system’s power usage would properly be less then 40W under those loads. Combined with a Higher RPS rate we are talking about 6 – 7x Power usage difference. It is still impressive but you would have been much credited.

  5. John Mao says:

    Thanks everyone for the feedback. I hope everyone can bear with us as we take on this new frontier together. We’re clearly no longer comparing x86-systems to one another any more – there are efficiency implications that span beyond just the chips, chipsets, and memories. We will soon be looking at chassis (and rack level) comparisons (it’s the only real way to do a more “fair” comparison of throughput/performance and energy). Never thought we’d get the type of interest we received from this post, but this clearly shows there are lots of folks who are interested in this as much as we are! In the meantime, if anyone knows where to get a good “efficient” Intel server with an optimized power supply (and even with DDR3L), please let me know. We do want to do more fair comparisons if possible…hardware permitting.

    • I think this post mostly shows just how hard benchmarking is.

      I agree with most of the comments about the Intel CPU used -it’s about a year old, and is a high performance, not low power version. The “right” CPU might be something like an Intel E3-1220LV2 [1], with a max TDP of 17W. I believe the Calxeda server will still win (and I desperately want a decent standalone ARM server to test, not a massive cluster, SMBs pay electricity too!).

      However the bigger issue is just letting the network card be the bottleneck, that seems an odd decision to make, and pretty much makes any other component of the system irrelevant.

      I think the new versions of the Dell PowerEdge R210 II can be shipped with the E3-1220L v2, so should be much more power efficient, if you just want a “standard” rack mount server to compare against.

      1 – http://ark.intel.com/products/65735/Intel-Xeon-Processor-E3-1220LV2-%283M-Cache-2_30-GHz%29

  6. Take my money, I want one!

Trackbacks

  1. [...] has released the results of ApacheBench benchmark comparing their ARM-based EnergyCore solution to an Intel Xeon server in order to [...]

  2. [...] Calxeda has published impressive benchmarks on their ARM-based server, a machine that only consumes 5 Watts! [...]

  3. [...]  |  Calxeda (Arm Servers)  | Email this | Comments Categories: Electronics « BBQ Guru releases [...]

  4. [...]  |  Calxeda (Arm Servers)  | Email this | Comments Engadget Tags: benchmarks, Calxeda, chips, claim, [...]

  5. [...]  |  Calxeda (Arm Servers)  | Email this | Comments Be Sociable, Share! Tweet [...]

  6. [...]  |  Calxeda (Arm Servers)  | Email this | Comments ARM-based, Calxeda, ECX, engadget, intel xeon, [...]

  7. [...]  |  Calxeda (Arm Servers)  | Email this | Comments Read this article: Calxeda benchmarks claim that its server chips are [...]

  8. [...] системах. В подтверждение своим словам, разработчик опубликовал результаты одного из бенчмарков, наглядно [...]

  9. [...] naudoti ARM procesorius serverių sistemose. Kad žodžiai nebūtų pasakyti veltui, kompanija publikavo vieno iš testų rezultatus, kurie akivaizdžiai demonstruoja triuškinantį keturių branduolių [...]

  10. [...] Also of note Calxeda has posted ApacheBench numbers using their new chips. That can be found here. [...]

  11. [...] results, posted to the armservers.com Website, indicated that with the EnergyCore system, for 1 million requests, the server averaged 5,500 [...]

  12. [...] results, posted to the armservers.com Website, indicated that with the EnergyCore system, for 1 million requests, the server averaged 5,500 [...]

  13. [...] Visit site: Apache Benchmarks for Calxeda's 5-Watt Web Server – ARM … [...]

  14. [...] fueron los resultados? Según publica el sitio web ARM Servers, partiendo de un millón de peticiones, el sistema EnergyCore fue capaz de procesar unas 5.500 por [...]

  15. [...] simpatici processorini sono noti per essere molto parchi in termini di consumo e, se date un occhio a questo benchmark, vi renderete conto di come possono anche essere molto efficienti nell’elaborazione di [...]

  16. [...] in June, Calxeda published web-serving benchmarks that claimed a significant advantage in performance per watt over x86-based servers. Using [...]

  17. [...] simpatici processorini sono noti per essere molto parchi in termini di consumo e, se date un occhio a questo benchmark, vi renderete conto di come possono anche essere molto efficienti nell’elaborazione di contenuti [...]

  18. […] fueron los resultados? Según publica el sitio web ARM Servers, partiendo de un millón de peticiones, el sistema EnergyCore fue capaz de procesar unas 5.500 por […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 980 other followers

%d bloggers like this: