EEMBC Brings Embedded Benchmarking out of the Pits

By: Richard G. Russell - www.io.com/~richardr

March 2000


Competition in the embedded microprocessor and microcontroller market is fierce. For any given application, there are often several processors which a system designer may select. Performance is often a key factor when selecting a processor for an embedded application. Due to time-to-market demands, system complexity and the shear number of available products, the selection process has become increasingly difficult and more critical that it ever before. Performance is not the only selection criteria but it is often one of the most important, and often the most controversial. Fortunately, the EEMBC now has new benchmarking suites specifically designed for evaluating microprocessors, microcontrollers, and DSPs for embedded systems.

Benchmarking for PC’s and workstations (just called workstations here) usually consists of finding the best performing system from a set of systems all at a similar price point. Generally, workstation benchmarks are all relative – system A is 10% faster than system B. While some people could argue the point, there are few useful empirical or absolute measures of workstation performance. The goal is simply to find the fastest system.

While workstation processor performance is important, it is usually a low priority when designing a new product. For a new workstation, there are rarely more than two or three applicable processor choices, as price/performance points are well known. The system price point, and to a lesser extent the target market, determines the processor. The goal is to select the highest clock rate processor that fits in the desired system bill of materials cost. For example, a PC at a given price point may have a 800MHz AMD Athlon in a given quarter, while the next quarter it may contain a 1GHz AMD Athlon.

Workstation benchmarking tools are readily available, well understood, and widely accepted. Workstation operating environments and hardware are also very predictable and well understood. Furthermore, workstation designers generally only need to consider two operating system categories, namely Win32 and a few Unix variants.

The Pitfalls of Embedded Processor Benchmarking

Unlike a workstation, it is extremely difficult to reliably benchmark processors for embedded systems. The benchmarking goal is very different – it is crucial to find a processor which has enough performance for the application at the lowest price point! For most embedded applications, it makes no sense to buy the fastest processor.

The primary goal of many embedded systems designers is to minimize cost for a given performance leve. This is in contrast to maximizing performance at a specific cost point. Most applications must have the lowest-cost processor that will do the job. Other goals include minimized power consumption or physical size, but it’s rare that optimizing for cost is not a first or second priority.

Most System designers and engineers turn to benchmarks to help answer the performance question. Unfortunately, they face several problems when trying to benchmark a processor for an embedded system. First, there have never been good benchmarks for embedded systems. Most embedded benchmarking has been done using synthetic workstation benchmarks such as SPEC, or Dhrystone MIPS. These benchmarks measure a specific aspect of processor performance such as integer and floating-point calculations. Such synthetic benchmarks relate poorly to application performance. This poses a problem for embedded system designers – how do they know how a processor will perform in the target application?

Take a midrange router for example. If a vendor rates its processor at 36 Dhrystone MIPS, then how many Ethernet frames can it route per second? There is no way to tell. Even processors with a comparable MIPS rating will perform very differently in this application. For example, a 16-bit and 32-bit processor could have very similar MIPS ratings. However, the 32-bit processor will generally perform much better in a router because of better pipelining and a wider data path. But even this may not be true! What if the 16-bit processor had special microcode to perform IP address lookups? Such a processor might out perform a more expensive 32bit device at a higher clock rate.

Many synthetic benchmarks are often targeted at measuring specific aspects of CPU performance, such as the floating point performance or integer performance. Such benchmarks only allow a coarse relative comparison between the performance of different processors. The results mean even less for processors with very different architectures. Synthetic measurements have little bearing on how well a processor would perform in an embedded application such as a router, cell phone or Voice over IP gateway.

Another problem is that porting and running a set of benchmarks on multiple architectures can be extremely time consuming. Each architecture has different development tools and physical hardware platforms. Embedded systems also have a wide variety of operating systems. There are five or more common embedded operating system products and over 50% of the embedded system designs use a proprietary executive or OS in favor of a commercial RTOS product. Furthermore, how is a benchmark ported which requires Unix or Win32 operating system be present? It’s difficult enough to port to one target system, let alone three or four with different hardware architectures and embedded firmware API.

Benchmarking results are frequently controversial. Poorly reported or obviously inflated bench-marketing results can cause uproar in an entire industry. Remember the Linux/NT benchmarking controversy? After years of dubious benchmarking reports many people have become down right suspicious of benchmark results and have difficulty believing that vendors honestly derived their results. Daniel Mann, a benchmarking expert from AMD, has written that “there are lies, damn lies and benchmarks”. The mistrust of benchmarking results are just another in a long list of problems faced by systems designers and engineers.

An additional problem is that there are often multiple processor architectures which are well suited to any given embedded application. This is rarely, if ever, a problem for workstation designers. Entire workstation companies are built around a single processor architecture. X86 and Sparc are good examples. In contrast, it is often difficult to narrow down the architecture for a mid range router. Take a mid-range network router as an example. The AMD x86, QED MIPS, Motorola Power PC, and ARM9E are very divergent architectures but they are all viable for this application.

All of these problems, combined with the rapid pace of development of new embedded processors, has placed embedded system designers in a difficult position: Performance is a key factor in selecting a microprocessor but the tools are not available to empirically determine reliable, repeatable, believable, and more importantly, comparable performance statistics. In the past, misleading numbers have been published based on synthetic workstation benchmarks. This has exacerbated the problem and led to a great deal of skepticism of past embedded benchmarks as customers and the press can rarely reproduce the results.

The EDN Embedded Microprocessor Benchmark Consortium

The EDN Embedded Microprocessor Benchmark Consortium was founded to solve these problems. The EEMBC is a group of over 30 companies that design and manufacture microprocessors, microcontrollers, and DSPs for the embedded market. The EEMBC has developed a standard set of benchmarks designed to reliably measure application performance and allow system designers to make direct comparisons between very different processors. Instead of focusing on synthetic algorithms, the EEMBC benchmarks implement over thirty core algorithms used in the telecommunications, networking, automotive, industrial automation, consumer electronic, and office automation markets.

The consortium developed these benchmarks with some specific goals in mind. First, the benchmarks are very portable and run on a wide variety of platforms from small 8- and 16-bit processors to 32- and 64-bit monsters. Second, they accurately reflect how a customer’s developed application will perform on an embedded system. Third, an independent testing laboratory certifies all publicly available results from all members of the consortium. These characteristics solve the long-standing problems associated with benchmarking embedded processor systems.

Unlike synthetic processor benchmarks, the EEMBC benchmarks quantitative measurements of how a processor performs when it is running real-world algorithms and tasks. The EEMBC uses an absolute measure of benchmark loop iterations per second for all benchmarks. This allows a designer to directly compare any two processors.

Of course, a single benchmark cannot tell the whole performance story – a broad suite of benchmarks are needed to paint a complete picture of a processor's performance. To develop this suite, the EEMBC took a divide-and-conquer approach, targeting specific applications and markets. In the fall of 1997, the EEMBC established five subcommittees to focus on specific application areas. The five subcommittees are automotive/industrial, consumer products, networking, office automation, and telecommunications applications. Each of the subcommittees reports to the EEMBC board of directors and to EEMBC President Markus Levy. Each member company has one voting board member, and the President does not vote.

Industry experts from the member companies chair these subcommittees. The chairs have considerable knowledge of the targeted applications. Specialists from five or six of the member companies then staffed each subcommittee.This approach provided for a balanced and even handed design and implementation of the benchmarks.

Over the next year and a half, the subcommittees defined and developed an entire suite of portable benchmark kernels. This proved to be a challenging task. Not only did the benchmarks have to be representative of the way system designers implemented applications, the benchmarks also had to be portable to the processors from all the semiconductor vendors, as well as the plethora of compilers for each of the architectures.

The Benchmarks

The EEMBC developed the benchmark kernels with one over-riding characteristic in mind – they had to be representative of how applications functioned in the real world. To meet this goal, each of the five EEMBC subcommittees developed a suite of benchmarks that reflected the actual algorithms, techniques, and functions used in real world applications.

For example, the telecommunications subcommittee recognized that the Fast Fourier Transform (FFT) was a fundamental algorithm used in wide variety of products from cell phones to ADSL modems. Instead of developing a synthetic benchmark that only measured something like floating-point operations per second, Russ Riven with Analog Devices, the subcommittee chair, drove the definition and development of a true FFT benchmark.

The networking subcommittee also focused on a popular task – routing. Chaired by Paul Cobb of QED, this group developed a compressive routing benchmark which actually performs packet parsing and routing table lookups for a realistic set of IP frames. This benchmark is particularly effective as it can use more than one size of routing table. This avoids the one-size-fits all approach of synthetic benchmarks.

Table lookups and interpolation are frequently implemented algorithms used in automotive and industrial control applications. EEMBC’s Automotive/Industrial group, chaired by Nigel Allison from Motorola, felt this was particularly important because these algorithms are often executed thousands of times per second. A processor that implemented them most efficiently would certainly be well suited for controlling a modern engine or drive train.

The Office Automation subcommittee chaired by Dominic McCarthy of SandCraft Inc., decided to benchmark a fundamental algorithm used in almost every printer – rendering of Bezier Curves. Bitmap rotation, another office automation benchmark, is also a universal function implemented by almost any application that has a rasterized display or printing engine. Designers of these cost sensitive products can now directly measure how will a processor will handle printing and rasterizing tasks allowing them to bring products to market more quickly.

Digital cameras are one area of consumer electronics that are particularly difficult to design. Processors for these devices must be physical small, very low power, and have enough horsepower to quickly compress and decompress JPEG images. This is why the consumer team, chaired by Randy Henderson of IBM, chose to implement benchmarks that directly measure how well a processor handles JPEG compression and decompression.

Of course, the benchmark kernels implemented in version 1.0 of the EEMBC suite don’t reflect every possible embedded application. However, they do cover a very wide range and can easily be applied to other related applications. For example, many scientific and data-collection applications use FFT. The JPEG compression and decompression benchmarks can be applied to any device which supports a web browser. The routing and other networking benchmarks will provide a good indication of how a processor and system will handle general packet-based processing.

The EEMBC Test Harness

Portability was one of EEMBC’s major hurdles. It is a key requirement that porting the entire suite of benchmark kernels to a system be straightforward for EEMBC licensees. To deal with this issue, EEMBC developed a portable benchmarking Test Harness (TH) which runs on a wide range of target processors and platforms. The TH provides a standard benchmarking API and an interface to a host system. This API and interface are consistent across all member architectures. Alan R. Weiss, Chief Technical Officer of the EEMBC Certification Laboratories (ECL), developed the proposal for the "benchmarking test harness" application-programmer interface (API) and architecture. Richard Russell, Manager of System Software Engineering for AMD’s Embedded Processor Division, evolved this concept into a simple, yet extremely portable system, which EEMBC refers to as the Test Harness (TH). Once the Test Harness is ported to a target system, any of the EEMBC benchmarks can be run on that system. The TH is analogous to a small operating system in that it isolates the benchmark code from basic processor and platform dependencies.

EEMBC’s TH is much more than an API. It is really a system of components that Richard designed to communicate via well-defined interfaces and messages. Figure 2 shows how the components of the TH fit together.

 

The TH provides an interface with a standardized set of services between a benchmark and the target system. The TH is comprised of a TH Functional Layer (THFL) and a TH Adaptation Layer (THAL) that are linked together by the tool chain specific to the target processor and system.. The THFL is a portable component that remains constant from system to system. The THAL provides an interface to the target system's hardware or resident software (such as a ROM monitor or an operating system). The THAL provides functions for performing I/O to a host system, handling a target system's timer (if one is available), and defining the RAM that the TH should use for storing the downloaded files, a heap, and other run-time data. The THAL also provides a configuration header file that contains several configurable parameters that control how the compiler builds the TH.

The Test Control Scripts (TCS), typically in written in PERL by the porting engineer, communicate with the Host Control Program (HCP) to control the loading, execution, and results reporting of a test. The HCP is a command-line program that runs on the Host System and typically communicates with the target system using an RS232 serial port and a simple protocol.

Porting the Test Harness to a platform is a very straightforward process. The port can take as little as one hour, especially if an engineer takes advantage of the EEMBC-provided template code.

The EEMBC Certification Lab

One of the most vexing problems with benchmarks is credibility -- who can can be believed?... The EEMBC Certification Lab (ECL) can be both believed and trusted. ECL is an independent certification lab licensed by EEMBC to certify all the results developed by the member companies and other EEMBC benchmark licensees.

Unlike other organizations that report benchmark scores, all publicly released EEMBC benchmark scores must be certified by ECL. ECL does much more than just verify that the declared results can be reproduced -- ECL and EEMBC have established processes and rules which help to ensure that benchmark scores accurately and reliably reflect the true performance of the processor and system on which they are executed. ECL verifies many aspects of benchmark execution and then certifies that the reported results are accurate, reliable and repeatable.

For every platform certified, ECL performs code reviews of the benchmarks to make sure that the vendors have implemented them correctly and without changing the fundamental algorithm or function of the benchmarks. ECL also ensures that the vendor has disclosed all pertinent information about the benchmark, such as the processor measured, system platform characteristics, compiler version numbers and the compiler options used to build the benchmarks. ECL also does physical verification of the actual systems used to produce benchmark scores. One basic test is verifying that the system clock rate reported by the vendor is the one used to establish the scores.

Wrapping Up

The EEMBC benchmarks, Test Harness, and the EEMBC Certification Laboratories are valuable tools for embedded system designers. By using these application-oriented benchmarks, designers can be sure that a processor has the right performance for the job. The portable architecture of the benchmarking system allows designers to easily judge several processors from different architectures. ECL ensures that designers can trust results to be reliable and repeatable. This benefits both the semiconductor manufacturers and their customers by allowing products to be built with exactly the right price/performance ratio.


Copright © 2000, 2001, by Richard G. Russell, All Rights Reserved