Distributed Databases Require Hefty Planning

dtddbDistributed-database technology, while far from mature, continues to evolve in parallel with client/server computing.

The goal of distributed computing is to allow users to transparently read and update multiple databases running on different platforms in multiple distant sites. This ideal has not been achieved, according to observers, because as the number of different databases and hardware devices to be accommodated increases, it’s harder to put together a stable and reliable system.

“You can get pretty good distributed-database performance for [data] reads; but when you are attempting to modify or write new data, generally most database products will not give 100 percent transparency,” said Paul Winsberg, a principal partner with Database Associates, a consulting firm in Berkeley, Calif.

This has not prevented a number of organizations from setting up distributed-database systems. But most of the successful distributed systems running today are heterogeneous, a single type of database communicating through a common set of network protocols.

All the major relational-database developers, such as IBM, Oracle Corp., Informix Software Inc., Gupta Corp., Ingres Corp. and Sybase Inc., are working on ways to improve the distributed-database capabilities of their systems.

Ingres, for example, has built a number of features into its Ingres/Star database to support distributed-database applications, including a distributed query optimizer, which, by analyzing CPU costs and network communication costs, allows database administrators to coordinate joins between tables in separate databases along the fastest, most cost-effective route, said Diana Parr, director of server marketing at the Alameda, Calif., firm.

The key to supporting distributed-database management and on-line transaction processing is the two-phase commit, according to David Knight, senior manager with Oracle Corp.’s Server Product Marketing group in Redwood Shores, Calif.

Under the two-phase-commit method, the database system shuts down a transaction that it cannot fully execute, he said. This ensures that the database won’t be corrupted if a server crashes in the middle of a transaction and allows database administrators to restore all on-line databases up to the moment the crash occurred, before adding any new transactions, he said.

Most relational databases that support on-line transaction processing, including Oracle Server, Informix and Ingres, use the two-phase-commit process in one form or another.

To provide read and write access to multiple vendors’ databases, developers usually turn to gateways and other “middleware,” which generally is provided at extra charge by these vendors or by third parties such as Micro Decisionware Inc. and Information Builders Inc. (IBI).

Relational databases compatible with ANSI-standard Structured Query Language (SQL) are working with a least common denominator of technology, according to Winsberg. This makes it difficult to update data across databases from multiple vendors, he said.

IBI markets Enterprise Data Access/SQL, a server-based query-processing system that allows users to retrieve data from more than 50 databases. However, EDA/SQL currently works mainly for read-only access to relational and nonrelational databases, Winsberg said.

IBM’s Distributed Relational Database Architecture “is a more robust solution because it does provide some update capability, but it is designed to work within a limited IBM environment,” he said. It gives PC users the ability to access data stored in DB2 mainframe databases, OS/2 servers or in AS/400 databases.

Micro Decisionware’s DB2 Gateway is also widely used by developers who want to link PC-based query applications to DB2 mainframes. But again, these applications are mainly query and report-generation tools for decision-support purposes.

Even with these gateways, distributed-database applications that link multiple sites can be slow and difficult to set up, according to Richard Finkelstein, president of Performance Computing Inc., a database consulting firm in Chicago.

“Joining multiple relational tables on a single machine can be a very slow process,” he said. “Joining multiple tables over a distributed network, even when you are working with fast lines, can be impossibly slow.”

However, major corporations are building distributed-database applications by setting moderate goals for themselves and by planning carefully.

ITT Hartford Insurance Co.’s Employee Benefits Division uses Oracle’s Parallel Server on a DEC VAXcluster to give 25 field offices around the United States access to customer records for filing and reviewing benefits claims, said Jim Bosco, project manager at the Hartford, Conn., firm.

Each field office has its own VAX, which can be used to enter and process claims, Bosco said. All of the field offices are connected to the VAXcluster in the home office through a DECnet WAN, he said.

“Most of the data that we use is entered and maintained in the home office. However, most of the field offices have local databases that are part of our enterprise data model,” Bosco explained.

A field-office worker can enter a query and call up the data from the local database; if it is not there, the system will look for the case file in the home-office VAXcluster. “The performance is transparent. The user doesn’t know where the data is coming from,” Bosco said. The field offices transmit updates to the home-office databases each night, he said.

This setup is successful because it’s a pure Oracle application running on a fairly fast server, according to Bosco. It would be more difficult to provide the same level of performance if the system were working with two different databases, he added.

Leave a Reply

Your email address will not be published. Required fields are marked *