Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of REAP - Home


 

 

 



O Pe NDAP Kepler Data Model Resolution

Difference between version 3 and version 2:

Lines 3-4 were replaced by lines 3-9
- # Poorly optimized in memory (unless the data is represented as a Ptolmey __matrix__ type)
- # Not really available to the existing Kepler/Ptolmey actorsuite due to it's inherent complexity.
+ !! N-Dimensional Array Issues
+ ! Poorly optimized in memory (unless the data is represented as a Ptolmey __matrix__ type)
+ Comments from Dan Higgins - 9/28/2007
+ # The Ptolemy array of tokens is inefficient while the Ptolemy Matrix is designed for 2-D info (images). I suggest that we add a new multidimensional array type to Kepler. Data would be stored linearly in memory (1-D Java array) with a second array indicating dimensionality (approach used in R). Thus a 1-D array with 12 elements could be dimensioned as a 1x12, a 2x6, 3x4, 2x3x2 etc arrays and the dimensionality can changed without moving any data. \\ \\(This is in effect the way that OPeNDAP stores it's array data in memory. If care was taken designing this Token type we could allow the internal storage to be passed in - thus allowing us to read OPeNDAP data and seamless wrap it in a Token without being forced to do a element by element copy. ndp 11/14/07)\\ \\
+ # Although a new multidimensional array type would be more space efficient than arrays of array tokens, any purely RAM based implementation will rapidly run into memory limitations as we try to handle bigger data sets (e.g. multiple datasets over time with time the 3rd dimension). So why not consider a disk-based option now?\\ \\
+ # Why not consider a new Kepler datatype that is file-based? i.e. store the opendap data in local files (perhaps CDF or HDF files? I think there are Java tools for reading such file(?)) and use 'file reference tokens' (these don't currently exist). (Currently we do use simple strings as file references in Kepler. We should add a 'ReferenceToken' that is immutable - i.e. for files this would involve file locking, etc.)\\ \\
+ # Java NIO routines offer some methods for optimizing speed of random access of large disk-based files using OS disk caches and other methods. We might want to investigate these to optimize performance of disk-based data storage.
At line 5 added 2 lines.
+ !! Data Structure Complexity
+ !DAP data is not really available to the existing Kepler/Ptolmey actorsuite due to it's inherent complexity.
Line 7 was replaced by line 14
- Ideas:
+ Because the DAP is rich in data stored in variations of the Structure data type this means that much of what is produced from DAP datasources will naturally map to a RecordToken. Currntly there is a RecordDisassembler Actor that can be used to break apart these record. Unfortunately it is very cumbersome for the user to configure. Beacuse of this Ilkay attempted to write an AutomatedRecordDisassembler Actor and immediatley ran into a wall because it was not possible for the Actor to determine the structure of the incoming RecordToken at design time.
Removed lines 9-13
- Comments from Dan Higgins - 9/28/2007
- # The Ptolemy array of tokens is inefficient while the Ptolemy Matrix is designed for 2-D info (images). I suggest that we add a new multidimensional array type to Kepler. Data would be stored linearly in memory (1-D Java array) with a second array indicating dimensionality (approach used in R). Thus a 1-D array with 12 elements could be dimensioned as a 1x12, a 2x6, 3x4, 2x3x2 etc arrays and the dimensionality can changed without moving any data.
- # Although a new multidimensional array type would be more space efficient than arrays of array tokens, any purely RAM based implementation will rapidly run into memory limitations as we try to handle bigger data sets (e.g. multiple datasets over time with time the 3rd dimension). So why not consider a disk-based option now?
- # Why not consider a new Kepler datatype that is file-based? i.e. store the opendap data in local files (perhaps CDF or HDF files? I think there are Java tools for reading such file(?)) and use 'file reference tokens' (these don't currently exist). (Currently we do use simple strings as file references in Kepler. We should add a 'ReferenceToken' that is immutable - i.e. for files this would involve file locking, etc.)
- # Java NIO routines offer some methods for optimizing speed of random access of large disk-based files using OS disk caches and other methods. We might want to investigate these to optimize performance of disk-based data storage.

Back to O Pe NDAP Kepler Data Model Resolution, or to the Page History.