Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of REAP - Home


 

 

 



O Pe NDAP Kepler Data Model Resolution

This is version 2. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


The OPeNDAP data model (aka the DAP2 data model) supports more complex data objects than the Kepler/Ptolmey data model. In particular DAP2 supports deeper hierarchies and N-imensional arrays. Although these data can find logical representations in Kepler/Ptolmey the end result is:

  1. Poorly optimized in memory (unless the data is represented as a Ptolmey matrix type)
  2. Not really available to the existing Kepler/Ptolmey actorsuite due to it's inherent complexity.

Ideas:

Comments from Dan Higgins - 9/28/2007

  1. The Ptolemy array of tokens is inefficient while the Ptolemy Matrix is designed for 2-D info (images). I suggest that we add a new multidimensional array type to Kepler. Data would be stored linearly in memory (1-D Java array) with a second array indicating dimensionality (approach used in R). Thus a 1-D array with 12 elements could be dimensioned as a 1x12, a 2x6, 3x4, 2x3x2 etc arrays and the dimensionality can changed without moving any data.
  2. Although a new multidimensional array type would be more space efficient than arrays of array tokens, any purely RAM based implementation will rapidly run into memory limitations as we try to handle bigger data sets (e.g. multiple datasets over time with time the 3rd dimension). So why not consider a disk-based option now?
  3. Why not consider a new Kepler datatype that is file-based? i.e. store the opendap data in local files (perhaps CDF or HDF files? I think there are Java tools for reading such file(?)) and use 'file reference tokens' (these don't currently exist). (Currently we do use simple strings as file references in Kepler. We should add a 'ReferenceToken' that is immutable - i.e. for files this would involve file locking, etc.)
  4. Java NIO routines offer some methods for optimizing speed of random access of large disk-based files using OS disk caches and other methods. We might want to investigate these to optimize performance of disk-based data storage.



Go to top   More info...   Attach file...
This particular version was published on 28-Sep-2007 10:18:48 PDT by uid=higgins,o=NCEAS.