Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of REAP - Home


 

 

 



O Pe NDAP Kepler Data Model Resolution

This is version 4. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


The OPeNDAP data model (aka the DAP2 data model) supports more complex data objects than the Kepler/Ptolmey data model. In particular DAP2 supports deeper hierarchies and N-imensional arrays. Although these data can find logical representations in Kepler/Ptolmey the end result is:

N-Dimensional Array Issues

Poorly optimized in memory (unless the data is represented as a Ptolmey matrix type)

Comments from Dan Higgins - 9/28/2007
  1. The Ptolemy array of tokens is inefficient while the Ptolemy Matrix is designed for 2-D info (images). I suggest that we add a new multidimensional array type to Kepler. Data would be stored linearly in memory (1-D Java array) with a second array indicating dimensionality (approach used in R). Thus a 1-D array with 12 elements could be dimensioned as a 1x12, a 2x6, 3x4, 2x3x2 etc arrays and the dimensionality can changed without moving any data.

    (This is in effect the way that OPeNDAP stores it's array data in memory. If care was taken designing this Token type we could allow the internal storage to be passed in - thus allowing us to read OPeNDAP data and seamless wrap it in a Token without being forced to do a element by element copy. ndp 11/14/07)

  2. Although a new multidimensional array type would be more space efficient than arrays of array tokens, any purely RAM based implementation will rapidly run into memory limitations as we try to handle bigger data sets (e.g. multiple datasets over time with time the 3rd dimension). So why not consider a disk-based option now?

  3. Why not consider a new Kepler datatype that is file-based? i.e. store the opendap data in local files (perhaps CDF or HDF files? I think there are Java tools for reading such file(?)) and use 'file reference tokens' (these don't currently exist). (Currently we do use simple strings as file references in Kepler. We should add a 'ReferenceToken' that is immutable - i.e. for files this would involve file locking, etc.)

  4. Java NIO routines offer some methods for optimizing speed of random access of large disk-based files using OS disk caches and other methods. We might want to investigate these to optimize performance of disk-based data storage.

Data Structure Complexity

DAP data is not really available to the existing Kepler/Ptolmey actorsuite due to it's inherent complexity.

Because the DAP is rich in data stored in variations of the Structure data type this means that much of what is produced from DAP data sources will naturally map to a RecordToken. Currently there is a RecordDisassembler Actor that can be used to break apart these RecordTokens. Unfortunately it is very cumbersome for the user to configure. Beacuse of this Ilkay attempted to write an AutomatedRecordDisassembler Actor and immediatley ran into a wall because it was not possible for the Actor to determine the structure of the incoming RecordToken at design time.



Go to top   More info...   Attach file...
This particular version was published on 14-Nov-2007 12:14:48 PST by uid=potter,o=unaffiliated.