exstreamspeed Features

Features

The exstreamspeed library combines elements of a container class API, a general purpose database and an OLAP query engine.

The interface aims to integrate better the programming paradigms of databases and traditional programming languages like C/C++. It combines new innovations in data modeling, querying and serialization techniques for distributed applications, with an easy-to use and familiar container-like API.

Structure

Architecture
Interface
- Iterators
- Aggregators
Serialization
- User-extensible
- Memory-mapping

Example Case Studies

Architecture

Hierarchical Model

exstreamspeed is hierarchical and object-oriented. At the outset it looks relational in appearance: A database is constructed from a set of objects, known as nodes. Each node contains a set of tuples, similar to a relation. Individual fields within a tuple may, however, refer to other nodes, known as child nodes, thus forming a hierarchy.

Each node in the hierarchy has a type, known as a class, which defines it's structure: the set of fields, indices and parent information. Classes can extend, or subclass, other classes. This enables a field, designated to refer to a node of some class, to refer to nodes of any valid subclass.

Use of hierarchies helps to reduce redundancy by explicitly factoring-out the need for key/foreign key pairs required in a purely relational approach. Use of inheritance helps to manage optional or contingent fields and to more closely parallel an application object design.

exstreamspeed also allows a child node to be sharable - i.e. referred to by one or more tuples in one or more parent nodes. This can be convenient and efficient in modeling many-to-many relationships and is used extensively in the risk database example program.

[Top]

Hybrid Column/Row Based

exstreamspeed is primarily a column-based storage model. This ensures colocation of column-data and allows greater flexibility of design: The time to query one column in a 4-column node is the same as querying the same column in a 250 column node.

exstreamspeed also has the facility for defining composite or fixed-width user-defined columns. A column can be defined as a composition of one or more sub-columns - effectively a row. This makes it possible to turn the database into a row-based or column/row hybrid model as required. This technique is explored in the tick data example program.

[Top]

String-based Data

exstreamspeed provides an efficient mechanism to factor-out string-based data. An object known as a string dictionary maps strings to integer ids - known as string-ids - which can be stored in the database hierarchy. The string dictionary guarantees a tight range of string-ids 1..N, where N is the number of strings in the dictionary. This can be exploited by the index (see below).

The string dictionary implementation is based on an innovative new hashing algorithm that is two and a half times faster with a memory footprint half the size of industry standard hash map implementations.

[Top]

Indices

Each node is attached to an index for fast look-up. exstreamspeed provides a choice of three index algorithms:

None This method has constant-time lookup characteristics, akin to simple array indexing. It is the fastest method. It is designed to be used with string-ids and date-ids which are guaranteed to fall within a tight range of values.

Private This method supports multiple keys and is based on the string dictionary fast hashing scheme. It is very fast with ultra-low memory footprint.

Shared This method is identical to the Private method except that it generates an index that is shared amongst all node instances of a particular class.

The shared index is designed for cases where there is clustering of data in multidimensional space. For example, a sales database might indicate that sales of certain products are always made to certain demographics of customers regardless of time - i.e. there is a common set of product and customer combinations that repeats itself in each time-slice. A designer may designate time as a key to a parent class and product and customer as the keys to a child class with a shared index.

[Top]

Interface

Iterators

Iterators are used to look-up, update and fetch data. The interface is based on the familiar begin, fetch/update, increment cycle to be found in many container class APIs.

Iterators can be used to traverse all the tuples in a node, or designated by some filter to iterate across only a subset. Iterators can iterate in sorted order and can be used to look-up and fetch/update/insert a single row.

exstreamspeed offers a choice of different sorting algorithms optimized for different use cases such as whether you want to iterate across all or most of the tuples in a node or simply the top (or bottom) few.

Iterators are bound by class, meaning that an interator instance is able to iterate over any node instance of a particular class or valid subclass.

[Top]

Aggregators

Iterators offer the most flexibility in how to access and filter data. General purpose and OLAP databases, however, generally offer more declarative means of selecting subsets of data via languages such as SQL or MDX.

exstreamspeed offers the aggregator API, an alternative high-level interface for aggregating, deriving and filtering data.

The aggregator is very powerful. You can select keys, aggregate result fields, filters and formulas. The aggregator will automatically determine a query plan based on the schema of the database and execute it, using the iterator API under the covers. The run-time performance of the aggregator is comparable with writing out a program by-hand and using the iterator interface directly - i.e. very fast.

The aggregator is an object interface with separate functions for defining keys, results etc... This makes it easy to integrate with application code. The high-level and declarative nature of the interface also makes it easy to embed in client-server applications emulating a database.

The exstreamspeed query language is a language wrapper over the aggregator API designed to be able to construct and run adhoc queries faster than having to code directly against the aggregator interface.

The risk database example program makes extensive use of the aggregator.

[Top]

Serialization

User-extensible

exstreamspeed schemas and databases are serializable. The API is user-extensible allowing custom storage implementations to be used.

The API provides wrapper implementations specifically for serializing schemas and databases to disk and across networks.

A network programming library is included to make it easier to write distributed applications that communicate over udp/ip or tcp/ip protocols.

The mesh database example uses the network programming interface to build a network distributed in-memory database.

[Top]

Memory-mapping

exstreamspeed also offers a special serialization technique for memory-mapping databases stored on disk.

This is designed for applications that may wish to access a database to stored on disk but only wish to query a small portion of it. The special memory-mapping and columnar format of the database ensures that only those portions queried are loaded from disk. This can dramatically improve run-time performance of the application.

The tick data example program makes use of this technique.

[Top]