Database Engine - Cataldo Images

Situation

This database engine was first used on an Apple II computer with 143K floppy drives. The customer needed to store 5,000+ variable length pricing records per disk with instant access. This engine was also used in the MarketOrder Seamless Ordering Legacy system. The database engine had to handle computer crashes at any time. It was built ONLY for lookups and updates and was not built for enumerations.

Companies

Tool Specialty Company (Los Angeles, CA), Wetmore Cutting Tools (City of Industry, CA), MarketOrder.com (Seattle, WA)

Action

This is a very specialized database engine. It was designed for ultra fast lookups and updates. The database had to maintain data integrity and the computer could be powered off at any time without warning. The original Apple II computer had a simple power-off switch and the MarketOrder legacy system used a Telxon 701 handheld computer with batteries that could fall out when dropped (and no backup power). The Apple II version could hold well over 5,000 variable length records on a 143K floppy disk. The MarketOrder version had to store up to 100,000 item records with UPC number, description, price, and qty in less than 4M of RAM. Some of the requirements for the database included:

Data integrity when computer crashes
Ultra fast looks and updates
Data error detection correction
Ability to remotely construct all data in database
Very small code footprint
No indexing for lookups

Implementation: The original database engine wrote data directly to the sectors on the Apple II floppy disk and bypassed the normal file system. The MarketOrder version used a file but wrote to fixed blocks within that file. The engine used a double-hashing algorithm to store and retrieve the records, required no index or directory, included multiple data integrity checks, and could do limited data correction. The database used fixed data blocks (either disk sectors or file blocks). The number of data of blocks was a prime number. The lookup key was hashed using a double hashing function to generate primary and secondary hash values. Other details about the engine include:

The main hashing function used Σ (Prime_i) ^key(i), where P_iis a prime number sequence and ^key(i) is the ordinal value of the ASCII character code minus the ordinal value of ‘0’. This value is the base hash value.
The prime hash value was calculated using the base hash value reduced modulo the table size (a prime). The secondary has value was calculated by reducing the base hash value modulo a smaller prime number.
The initial block number was calculated by subtracting the secondary hash from the primary hash. That value is the first disk sector or the file block to use for the record. When storing a record the engine would try to store that record in the initial block number. If the data block is full then the secondary hash was subtracted from the initial block number and the new value was used as the data block or sector (with wraparound if the value was negative). The subtraction process continues five times before the database is considered as full.
Each data block contained the data length which pointed to the first free byte in the block. Each data record started with a length field. When the record length was added to the current block position the result would be the length field of the next record or the first free byte in the block. Traversing all records in the block will end at the byte pointed to by the block length.
All records in the block were in key sorted order. Thus a linear search was performed within the block and when the current key was greater than the search key the block search ended (record not found). This also acted as a data integrity check. If the keys were out of order then an error occurred and corrective measures could be taken.
Unused bytes in each data block were zeroed. If there were data bytes past the end of the records then something went wrong and corrective measures could be taken.

The original test version of this engine used a more standard hashing function similar to ones that you might find in a computer science textbook. I found that these functions did not evenly distribute the data in the data blocks. The records clumped into groups with other data blocks containing fewer records. A computer science professor at the time told me that this phenomenon was called “harmonic clustering” and the record collided based on the mathematical harmonics of the hashing function. The final hashing was based on the prime number sequence was suggested by a math professor. He had no idea if that would work but his intuition told him that it might (and it did).

This database engine would normally find a record with a single access and required no indexes. The Telxon version used a simple blocked file and the file was opened, the record was found, and the file was closed. Since the Telxon 701 used a RAM-based file system this file open/close required very little overhead and was exceedingly fast. Since the engine require no directories or indexes the computer could crash at any time and crashed would almost never affect the data. If the computer crashed in the middle of a write operation the integrity check would catch the error and reconstruction routines could recover or fix the data. The database ran very comfortable even when the storage medium was over 90% full.

The MarketOrder version used Boolean masking when performed the hashing calculations. This way computers with differ word lengths and different math engines would always calculate the same hash values. The Telxon 701 used an 8-bit CPU with 16-bit words and the PCs at the time used different word lengths. When calculating the hashes each intermediate result was ANDed with a bit mask to ensure that the results did not exceed the 16-bit word length. The PC would thus create a populated database and this file could be copied to the 701 and the lookups.

Outcome

This database engine was used for years in the machine shops for pricing records. The MarketOrder version allowed an outdated legacy handheld computer to operate on a huge database. The legacy system required lookups for all items in a large department store. A 100,000 item lookup and pricing table could be stored in a mere 4MB and the engine would comfortable operate on an 8-bit legacy CPU.

Before implementing this database engine we tried using a very well-known database system. After several weeks the company engineers stopped trying to make it work. Their database required too much processing power and could not compress the data sufficiently to meet the requirements. The off-the-shelf database did not even come close to working.

Benefits

This database engine was very fast, very robust, almost impervious to data errors, and each data block was checked with each record access. The data checking tool took almost no CPU time and added an additional level of reliability. The MarketOrder version allowed legacy terminals to function at a much high level than previously possible. The legacy application was a bridge measure. The customer had a huge investment in the equipment and the old equipment could be used until the hardware failed. When the hardware failed it would be replaced by a more modern device.

Product Details

The original database engine used the following:

Apple IIC/E with 143K floppy data drive
Forth programming language

The prototype application used the following:

Telxon PTC-701 handheld DOS terminals (legacy hardware)
Borland C++ compiler (legacy hardware)
Microsoft Visual Studio (new hardware)