Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of REAP - Home


 

 

 



O Pe NDAP Kepler Data Model Resolution

Difference between version 8 and version 5:

Line 1 was replaced by line 1
- The OPeNDAP data model (aka the DAP2 data model) supports more complex data objects than the Kepler/Ptolmey data model. In particular DAP2 supports deeper hierarchies and N-imensional arrays. Although these data can find logical representations in Kepler/Ptolmey the end result is:
+ The OPeNDAP data model (aka the DAP2 data model) supports more complex data objects than the Kepler/Ptolmey data model. In particular DAP2 supports deeper hierarchies and N-dimensional arrays. Although these data can find logical representations in Kepler/Ptolmey the end result is:
Lines 5-6 were replaced by lines 5-6
- Comments from Dan Higgins - 9/28/2007
- # The Ptolemy array of tokens is inefficient while the Ptolemy Matrix is designed for 2-D info (images). I suggest that we add a new multidimensional array type to Kepler. Data would be stored linearly in memory (1-D Java array) with a second array indicating dimensionality (approach used in R). Thus a 1-D array with 12 elements could be dimensioned as a 1x12, a 2x6, 3x4, 2x3x2 etc arrays and the dimensionality can changed without moving any data. \\ \\(This is in effect the way that OPeNDAP stores it's array data in memory. If care was taken designing this Token type we could allow the internal storage to be passed in - thus allowing us to read OPeNDAP data and seamlessly wrap it in a Token without being forced to do an element by element copy. ndp 11/14/07)\\ \\
+ ;__Comments from Dan Higgins - 9/28/2007__:''
+ # The Ptolemy array of tokens is inefficient while the Ptolemy Matrix is designed for 2-D info (images). I suggest that we add a new multidimensional array type to Kepler. Data would be stored linearly in memory (1-D Java array) with a second array indicating dimensionality (approach used in R). Thus a 1-D array with 12 elements could be dimensioned as a 1x12, a 2x6, 3x4, 2x3x2 etc arrays and the dimensionality can changed without moving any data. \\ \\{{(This is in effect the way that OPeNDAP stores it's array data in memory. If care was taken designing this Token type we could allow the internal storage to be passed in - thus allowing us to read OPeNDAP data and seamlessly wrap it in a Token without being forced to do an element by element copy. ndp 11/14/07)}}\\ \\
At line 9 added 2 lines.
+ ''
+ ;__Comments from Nathan Potter - 11/15/2007__:'' I have added an optimization step to the OPeNDAP actor in which it "squeezes" incoming arrays to remove dimensions whose size is equal to one. The result is that if the user subsets an N-Dimensional array in such a way that the result is effectively a 1 or 2 dimensional array, then the Actor will map it to a matrix (1xN and MxN respectively). While this doesn't address all of the memory usage concerns it will enable us to move forward with workflow development for the SST use case. That work should bring more focus on the other memory limitations that we may encounter.
At line 16 added 2 lines.
+ ;__ Comments From Dan Higgins (11/15/2007) __:''I got thinking about this issue after our last REAP call and thought I would try to summarize some thoughts here.\\ \\ At design time, actors are really independent. Most of the actors that dynamically generate output ports are data source actors. Except for trigger inputs, there are no inputs and the output ports can be generated because the actor uses some parameter to get the data needed to generate ports when it is instantiated (i.e. dropped on the canvas) or input parameters are changed. (Example: EML2 Datasource uses the eml metadata, the OpenDAP actor uses a URL; Changing an actor parameter during design will trigger changes in the output ports due to AttributeChanged events.) \\ \\ But the actor connected to the output of a Data Source only gets port data from the preceding actor during the fire cycle which doesn't occur during design. Say some data source puts its output in a complex form (e.g. a Kepler Record or XML or an R dataframe). If that output is connected to the input of another actor, without additional information, the following actor cannot know any details of its input until it receives the datatoken! The existing RecordDisassembler works by requiring the user to know some 'names' of items in the Record token and creating output ports with that name. \\ \\ Now, a RecordDisassembler actor could be given a Parameter that is an array of strings that are names of Record elements and then could automatically generate output ports based on that array (or it could be given a 'template' RecordToken and figure out the name array from the Token). A change in this parameter at design time would trigger changes in the outputs. The parameter could even be placed on the canvas and shared between multiple actors. (i.e. Parameters are a way for actors to share information at design time). \\ \\ Note that the problem is related to complex data types like a Record or XML file. The data type of ports can be set at design time and checked. But with complex types, the type itself is incomplete (i.e.more information is needed). For a Record, one would really need a complete description of the Record with element names and element types (perhaps recursive) and with XML, one would need a complete schema. Requiring such complete type descriptions would make actors so specific that their usefulness would be limited to only a few cases. \\ \\ Theoretically, one could have an actor query everyone of its predecessors when its input ports were connected to see if there is information about the details of the data that will be sent to it (e.g. a RecordDisassembler could ask its predecessor(s) for a prototype record) and use that data (if available) to configure outputs. But that greatly increases the complexity of a workflow because every actor is then (possibly) strongly linked to all of its predecessors! \\ \\ So all this brings me to one possible solution. Assume that any data source that actually gets information at design time (like the EML of the EML actor or the OpenDAP actor) ALSO created a Kepler parameter when it was dropped on the canvas (or its parameters were changed) and this few parameter were automatically made visible on the workspace canvas. The parameter would basically be the schema of the Record or EML that the actor might output. Any other actor added to the model could then use that parameter. e.g. if the parameter were a Record, a RecordDisassembler could use it as a template for creating outputs.
+ ''

Back to O Pe NDAP Kepler Data Model Resolution, or to the Page History.