Src-datasite

From IARC 207 Wiki
Revision as of 10:39, 15 December 2008 by imported>Ken
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DataSite is an approach to organize information and data from and about research data sites. It is based on the ThinObject scheme, which implements object-oriented systems using ordinary files, directories, and executables.

Slides from a talk on DataSite given at AGU2008 are posted online.

While DataSite can be considered as an implementation of an object-oriented database, it does not necessarily aim for high performance but instead tries to acheive a solid and "natural" modelling of the systems represented by it. DataSite might function as a backend for a more conventional database, which could then be used to support web pages and other end uses.

The main problem DataSite addresses is the variation and complexity of the data sets retrieved from remote dataloggers. A datalogger site typically generates periodic records of multiple columns, and usually those data are normalized in the sense that each column represents a single data type with a fixed set of attributes. Those data could easily be mapped to a simple database model, where tables and columns are defined and populated with the data.

That simple model breaks down, however, when changes are made to the datalogger programs, with possible changes to the table structures or to column attributes. With the data no longer fitting a set table-column structure, the database model must adapt, perhaps by defining additional abstraction layers. Similarly, variations from site to site provide significant challenges in database modelling. No doubt a database schema could be devised to handle all the variations and complexity, but we have not succeeded in acheiving this.

Where the Relational Database model seeks to impose its table-column structure on an information system, the Object Oriented approach seeks to model the system itself as a set of objects. With no predetermined structure to adhere to, the goal is to represent the information system as it exists, with all variation and relational complexity handled directly. While this might seem to be a zero-gain solution, it is possible to extend the object model beyond the scope of the source systems by providing simple views which effectively hide the complexity. Such an alternate view might be utilized to feed a relational database with a simpler system model, as a product of the object oriented approach.