The Exascale Computing Benchmark

NetPIPE

A Network Protocol Independent Performance Evaluator

Flops are irrelevant. Data matters, and data movement matters. Let's imagine for a minute that we had a perfect quantum computing FPU that could execute 25 exaflops, took up 1 cubic millimeter, and used brownian motion to power it. We'd have all the Exascale computing issues solved, and the chemists, physicists, and climate modelers would start battling it out for time on the machine, right?

Not so fast... To get a human the result of those 25 exaflops, we have to interact with and observe the results. In a pretty much classical newtonian mechanics macro -world in which as far as we can tell information transfer is accomplished with discrete digital electrical or optical signals.

Well, if we take a step up out of CS theory, or maybe just go back to the turing machine, we observe that those electrical and optical signals take energy in the transmitter.. watts = current x voltage, and dissapate some of that heat in the transmission media, and in the receiver. So really, the natural benchmark of our exaflop quantum FPU is not in flops, but in bits/second. But if we are talking about an exaflop in 1 cubic millimeter, we best be go find some mechanical engineers who know practical stuff about thermodynamics and heat transfer, because the surface of this cubic millimeter exaflop is going to be lit up like a quasar getting the data in and out. But I don't need to go all sci-fi on you to prove a point.. The latest processors from AMD and Intel modulate the CPU clock to stay within the thermal envelope of the silicon package. Someone's going to bring up GPUs, but we have many other practical problems getting data in and out of GPUs. And the fast ones run really hot. I will bet you that the computing *system* that has the best Bits/Joule is the one that will have the highest Bits/second (measured at the FPU), and in in turn, the highest machoFLOPS.

So let's just forget the machoFLOPS, and evaluate the system on Bits per Joule for effiency, and bits per second for peak capability... Hrrm.. What benchmark should we use? Might I offer a suggestion? Or write a few other ones. Or just report FLOPS in terms of bits/joule. Or exaflop/megawatt-hour or something.

The bitcoin corellary

SHA-256 hashing performance (Gigahash/sec, Gigahash/watt, and Hash/Joule) are limited not by process technology, but by the IR drop of the circuit board, pad mounts, solder, interposer, wire bonds, and silicon metal layers between the power converter and the active transistors. Once you get to around 40-28nm process tech, around half the power gets dissipated in the electron traffic jam on the way to the transistors. (This is based on a completely non-scientific gut reaction to the way a bunch of bitcoin miner asic/system vendors take some estimate on theoretical performance from their chip design and then end up halving it because they forgot to calculate 15 levels of IR losses.) More scientific results to come when efabless makes me an open-source hardware bitcoin ASIC chip.

Some interesting links on power densities

Other things written by Troy