3-Tier for Improved Database Throughput

There are several benefits to using 3-tier client/server architectures. A previous technical brief described the performance improvements possible from improved use of network bandwidth and protection from network latency limitations. This brief discusses the ways in which 3-tier architectures can be used to improve overall database throughput.

Process v. Platform Architecture

As a basis for this discussion, it is important to clarify the difference between process (software) and platform (hardware and operating system) architecture. When speaking of 3-tier architectures, I am refering to three process tiers, that is, separate software processes for presentation, logic and data access. These processes can reside on 1, 2, 3 or more physical machines. The situations in which they reside on 1 machine are typically not very interesting, so they are not considered; those in which they reside on more than 3 machines are really special-case extensions to 3 machines, so likewise they are not considered. That leaves us with the following two scenarios.

2 Physical Tiers 3 Physical Tiers

As described in the other brief the network traffic benefits apply to the 2 physical tier architecture, as well as 3 physical tier scenarios in which the logic server and client are connected over a low-bandwidth and/or high-latency network. The database throughput increase described here is best obtained in 3 physical tier deployments.

Server Models

An understanding of server models is also important for this discussion. The brief on transactions in distributed environments briefly describes several remote procedure call (RPC) server models and their most important attributes with respect to distributed transaction processing. In discussing database servers, we can ignore request-oriented servers and limit ourselves to connection-oriented servers. Thread- and process-per-connection server models are typical in the RDBMS world. Client connections persist over multiple requests, although a single connection can have only one outstanding (active) request, so a client establishes multiple connections in order to have multiple requests pending.

The thread- and process-per-connection models are conceptual simple, and extremely robust implementations abound. It is not surprising they are the models of choice for the major RDBMS vendors. However, they are not without limitations. First, there is a significant overhead associated with creating a process or thread, which has an obvious cost in response time when establishing a connection, but which overall places a noticeable burden on the server processor as connections are continually established. Second, each process or thread requires resources, most notably memory. Starting a sufficiently large number of processes can cause paging or swapping to disk. Third, the operating system on which the server runs has to schedule and service all these processes and threads. It likely has an optimal range of these which it can handle, and when pushed beyond that range, performance can suffer.

Advanced server models use thread and/or process pooling to obviate the overhead of establishing new server threads and/or processes and of servicing huge numbers of them. The pool can be created when the master server process is started, so that no additional creations are necessary to service clients. Further, by multiplexing multiple client requests to each pool member, the number of active threads or processes can be optimized once and kept constant.

The Database Throughput Bottleneck

In very high transaction rate environments, in which as many as thousands of database connections must be established and serviced, the server model employed by most RDBMS vendors is ultimately the throughput bottleneck. This is why virtually every vendor uses a transaction processing (TP) monitor such as TUXEDO when performing TPC benchmarks. Obviously, then, a 3 tier architecture using a TP monitor as the middle tier will get rid of the same bottlenecks in a real world deployment. However, there are many, many other possible sources of bottlenecks, including network bandwidth and latency, server CPU power, client CPU power, and client development tool. You cannot blithely assume that a performance problem you are experiencing will be addressed with a TP monitor.

Middleware Options

If your database is servicing large numbers of connections, however, it is quite possible that a 3 tier architecture will yield performance benefits. TP monitors are not the only choice of middleware, either. Given that the throughput key is in the pooled thread/process server model, any middleware which implements this model is a candidate to improve throughput. Distributed object technologies do or will support the necessary server model. Thus, you can leverage them to improve database throughput, assuming that you code your pooled objects to establish database connections when they are created, not in response to certain client requests. In fact, if you are using the RDBMS for more-or-less transparent object persistence, you'll need to code very carefully to allow the various pooled objects to share connections.

Microsoft's Remote Automation includes (as almost an after thought, it seems) a pool manager that creates a number of server instances for an Automation class. By wrapping database access in an Automation object, it can be pooled as well. DCOM provides support for multithreaded servers in such a way that threads can be pooled. Commercially available ORBs from Visigenic, Iona and Expersoft provide pooling natively, as CORBA server objects are by default explicitly created. Java servlets similarly require explicit instantiation (as far as I can tell from the Alpha documentation).

The distributed object vendors are also releasing distributed transaction products that resemble extensions to current TP monitors. Iona has paired with TP monitor vendor Encina to release Object Transaction Service. Microsoft has written an OLE-based transaction server formerly code-named Viper. They specifically list connection management as one of its benefits. These distributed object transaction monitors provide capabilities far beyond what you would code by hand with Remote Automation, DCOM or a bare ORB. They provide load balancing, failover and transaction coordination.


Copyright © 1997 Scott Nichol.
01-Jan-97