Thursday, March 29, 2007

The TPC-C database benchmark -- What does it really mean?


We explain how TPC-C works and what exactly it reports so you can interpret results

Vendors compete for database business largely on the basis of published benchmarks such as TPC-C. Yet users often do not understand very much about what goes into these benchmarks and what they mean. This article describes what the TPC-C is and how it can relate to your work. (2,000 words)

The TPC-C benchmark models the essence of a "typical" online transaction processing (OLTP) environment. One of the keys to understanding TPC-C results is the word "essence." Although the Transaction Processing Council (TPC) has done a good job of defining a benchmark that emulates the fundamental components of transaction processing environments, the nature of a generalized benchmark is such that it cannot feasibly represent very many actual environments.

The TPC-C benchmark simulates a large wholesale outlet's inventory management system. The operation consists of a number of warehouses, each with about ten terminals representing point-of-sale or point-of-inquiry stations. Transactions are defined to handle a new order entry, inquire about order status, and settle payment. These three user interaction transactions are straightforward.

Two other transactions model behind-the-scenes activity at warehouses. These are the stocking level inquiry and the delivery transactions. The stocking level inquiry scans a warehouse's inventory for items which are out of stock or are nearly so. The delivery transaction collects a number of orders and marks them as having been delivered. One instance of either of these transactions represents much more load than an instance of the new order, order status, or payment transactions.

The TPC-C is a descendant of the previous TPC-A. In fact, changing the names of the fields in the new order transaction effectively produces a duplicate of the TPC-A transaction. Despite this clear lineage, the newer TPC-C is far richer in functionality than its predecessors.

In addition to having many more types of transactions, TPC-C also mandates a much higher level of (simulated) user interaction. While the TPC-A application consisted of exactly one call to scanf(3) and one call to printf(3), TPC-C requires an entire application program to accept user input. The application's operation is precisely specified, to prevent subtly different interpretations of the specification to result in large variations in benchmark scores. The specification even mandates the appearance of the user interface screen on the terminals!

Although this is a major improvement over TPC-A, the TPC-C application still falls short of representing the typical database application. The most significant deficiency is that user input is not validated using the methods common to most applications. Most commercial applications are built with some sort of forms package, such as Windows for Data or JYACC. In addition to managing screen formats, these packages normally handle validation of input against the database. Typically, input data is validated on a field-by-field basis, as soon as the user leaves the field (such as using tab or return). In contrast, the TPC-C specification merely mandates that the input is validated; it does not specify how or when. So vendors customarily validate all input in a single batch, right before attempting to run the transaction. This reduces the number of interactions between the application and database and saves a great deal of overhead compared with normal applications. It's certainly possible to write applications this way, but in practice this approach is taken only when performance is critical, because it requires more programming effort and can sometimes be confusing to end users.

If your applications do not do batch input validation, you'll have to aim for considerably higher performance from your system. Although the SQL code to validate input is almost always very simple compared to the transactions themselves, most applications do a tremendous amount of it. As a result, it's often wise to add a third to a half to the target system's capability if you don't have an existing system to measure.

WAN considerations
Input validation is especially relevant when the clients and servers communicate over a wide area network, because SQL data is customarily transmitted over the network in relatively inefficient form. DBMS systems communicate between client and server via TCP/IP, and for a variety of complex reasons, they send each column of each row in a separate packet. A column is something like a first name or salary, although it might be something quite large, such as a compressed photographic image. Most columns are pretty small, averaging less than 200 bytes, so the TCP/IP overhead of 48 bytes per packet becomes significant. The overhead isn't a big deal on LANs, but on the restricted bandwidth of a WAN, this can be an issue.

Even more problematic is the end-to-end round trip time on WANs. On a network such as an Ethernet, round-trip time might be one to five milliseconds, while the same trip on a wide area network could easily take 100 times as long. When the application makes a single call to the DBMS for validation like TPC-C, network round-trip time might not be significant. Many applications do so many round trips that the entire client/server configuration could easily miss its performance goals for this reason alone.

Client/server implementations
TPC-C is virtually always run in client/server mode, meaning that the reported score is for a cluster of systems. Almost universally, vendors separate the many instances of the user application from the core database system. The only thing that runs on the machine that is reported is the database engine itself. For example, consider a result such as "Sun Ultra Enterprise 6000, 23,143 tpm-C using 16 processors, Oracle\x117.3.3, Solaris 2.6, and 11 Ultra-1/170 front-end systems." The approximately 20,000 simulated users log into one of the eleven front-end systems, and their SQL requests are sent to the Ultra Enterprise 6000 for processing.

This arrangement can have significant bearing on the interpretation of TPC-C results. If you are trying to size a system that will run application code as well as the database engine, you'll get quite an unpleasant surprise by relying too directly on TPC-C results. Fortunately, this sort of arrangement represents the minority of applications. The dominant database processing architecture is now client/server, in which the front-end application code runs on client systems, such as a PC or workstation, and the database system runs only the database engine itself.

The only fly in this ointment is that there are relatively few discrete client systems in most TPC-C configurations. For example, in the previous example there are only eleven client systems, each handling nearly 1,900 users. This type of client concentration is unlikely to occur in the real world. A system supporting 20,000 users would usually be connecting to more than 10,000 different client systems. The number of client systems is important, because vendors always take advantage of the limited number of clients systems and use a transaction processing (TP) monitor or some other form of connection multiplexor. This optimization isn't available if your application has 1,000 client systems, each connecting once to the DBMS server. The result is that there are 1,000 client connections on the server. However, TPC-C configurations universally use a TP monitor or some other software to reduce the number of active connections to just one to 10 per client system. As a result, there are many fewer active connections on the server, making it far easier to manage.

Batch processing
Another consideration that TPC-C does not take into account is batch processing. Most real OLTP applications have at least two distinct components: an online portion that creates and processes transactions and a batch portion that reports on period work. Often these batch jobs also reconcile daily activity with master databases or extract data to support related decision support processing. For example, bill processing and invoice reconciliation are tasks that are almost always handled in batch jobs.

The TPC-C is far richer than either TPC-A or TPC-B in this regard because it includes the delivery and stocking level transactions. Both of these transactions manipulate far more than the individual records associated with line items and orders; instead they deal with groups of business transactions. However, both of these operations are quite small compared to typical batch operations. Real applications often include significant batch components. For example, the Oracle Financials application suite contains the concept of a "concurrent manager," essentially a batch processing stream used to handle large and unwieldy processing requests that would not be interactive in nature.

Because batch processing requires no user interaction, it tends to consume processor and I/O resources much more quickly than online users. It's not unusual for individual batch jobs to consume an entire processor and attendant I/O resources. When your application has a significant batch component, TPC-C is unlikely to reflect your environment very well. Unfortunately, there isn't much you can do to extrapolate TPC-C results to reflect this workload, either.

TPC-C reporting rules
One of the curious -- and very misleading -- things about TPC-C scores is that they only report the rate of the new order transaction. The other four transactions are used only as background load to provide a context for the new order transactions. I'm not completely sure why the TPC designed the reporting rules this way, but this often confuses users of the results. The background transactions are defined to be at least 57 percent of the mix, so new orders are at most 43 percent of the work. This means that a score of 1000 (new order) transactions per minute actually represents over 2300 transactions (of all types) per minute. Anyone attempting to size a system "according to TPC-C" should account for true amount of work being done in the reported runs.

TPC-C scores in context
We've seen that delivered transaction rates are somewhat more than doubled, and that they are most relevant in a client/server environment. But what do these rates really mean? Let's take a look at a large-scale result, but not one of the top scores, the Ultra Enterprise 4000 using Informix 7.3 and ten Ultra-1/170 clients. The reported transaction rate is 15,461 transactions per minute, so this combination delivered about 39,955 transactions (of all five types) each minute. Servicing about 15,000 users, this appears to be a really big system. If we deflate the score by the additional 50 percent work (or so) necessary to handle real-life input validation, the score becomes 10,312 tpm-C. That's a lot of transactions every minute!

Server consolidation
Without consolidating multiple applications onto a single system, most systems have no requirement for anything like this type of throughput. With multiple applications running on a single system, transaction requirements can approach these levels, especially when the applications handle very large populations of users. TPC-C does not reflect these environments at all. It uses a single application with a single database instance, and the database locking strategies are designed accordingly. When many applications are consolidated onto a single system, they ordinarily do not use a single database instance, and multiple applications almost never share databases.

Scalability of multidatabase configurations is different than that of single database systems as used in TPC-C. Scalability of a given system might be better or worse than seen in TPC-C. Scalability might be worse due to a variety of considerations, such as processor cache saturation or resource management issues within either the operating system or DBMS. Scalability might be better in a multidatabase or multi-instance configuration if the applications have suitable locking strategies and particularly when little or no data is shared between applications.

Summing it up
The TPC-C is best used for approximate comparisons between generally similar systems. Because it is a highly optimized application with characteristics such as a single application, batch input validation, client/server configuration with very few client systems and minimal batch processing, TPC-C doesn't predict actual end-user performance as well as one might like. By considering many of these common deviations from real workloads, a user can plan a configuration without unrealistic expectations.

TPC-B used the same transaction as TPC-A, namely core of the most basic ATM teller transaction. The main difference between TPC-A and TPC-B is that the latter has no think time.

The TPC-C specification mandates that a specific percentage of transactions include invalidate data, forcing at least a few transaction rollbacks.

No comments: