BLAS and carbon footprint
Thursday, 07 August 2008 10:21

Ok, so the subject line is a little tongue-in-cheek. The purpose of this post is to discuss BLAS libraries and power consumption. And what a giant amount of variance in power consumption there is among the Big Four BLAS Libraries (BFBL):

  • MKL
  • ACML
  • GotoBLAS

Over the past few years, we've been using HPL as a sort of "worst-case scenario" for stress-testing hardware. Mostly, this is borne of a practical nature: HPL allows you easily tune the amount of memory consumed making it trivial to put the CPU and RAM under 100% load. That elusive "100% load state" has uncovered more hardware bugs in processors, motherboards and memory controllers than we can keep track of.

A few years ago, when processor manufacturers started including things like PowerNow!, SpeedStep and various grand dynamic power saving features, we found that TDP numbers given could only be used as a rule-of-thumb. We needed numbers that represented what would actually happen on HPC workloads. In an effort to burn-in hardware before delivering it to customers--again looking at the worst-case--we institutuded a little internal software project to keep our BLAS-enabled implementations as high-load as possible. We can network boot a little initrd that goes directly in to HPL turned for the specific host architecture.

To give you all an idea of what kind of variance there is among the BLAS libraries, I want to share some of our latest findings which are among the largest variances we've ever seen (don't compare AMD to Intel, here):

  • AMD:
    • ATLAS: 325 watts
    • ACML: 351 watts
    • GotoBLAS: 378 watts
  • Intel:
    • ATLAS: 378 watts
    • MKL: 405 watts

The AMD test hardware is a dual-socket motherboard with 2x 2354's @ 2.3GHz, unganged memory controllers, 16 GB of DDR2-667, 1 SATA HDD. The Intel hardware is a dual-socket Stokely (1600Mhz FSB) motherboard with 2x Harpertown's (45nm) @ 3.0GHz, 8 GB of 800MHz 1.8v FB-DIMMs and 1 SATA HDD.

The measurements were taken with a "Watt's Up? Pro" power meter and represent the upper-quartile (ie. these are not "spikes"). The AMD was testing with an 85% efficient 1U power supply and the Intel was tested with a 65% efficient 2U power supply (note that this amplies the variance from AMD-GotoBLAS to MKL somewhat and makes comparing AMD to Intel pointless for this discussion.)

What I think is discussion-worthy here is that the measured differences between BLAS libraries are larger than ever before due to--perhaps--an increase in the aggresive power saving features of the host silicon.

We released a public version of our "breakin" tool used to put the systems under load. We can't distribute the BLAS libraries in source form so developers would have to obtain those themselves. We got licensing permission to distribute a pre-compiled distribution that includes MKL and ACML. We put up ISO's and a tarball.

