Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of REAP - Home


 

 

 



Ocean_use_case_searching

Difference between version 13 and version 7:

At line 2 added 22 lines.
+ !!Update - 06/2010
+
+ The work so far has focused on two parallel goals:
+ * Build, distribute and support the NcML handler in Hyrax so that we can establish more metadata uniformity across servers without modifying datasets; and
+ * Build Java/XSLT software to transform the DAP 3.x DDX response into EML
+
+ !NCML Handler
+
+ Hyrax 1.5.x contains a the NcML handler that supports adding attributes and Hyrax 1.6.x contains a version of the NcML handler that supports aggregations, too. Other work will continue to add new capabilities to the handler
+
+ !EML generation
+
+ In general, we cannot build valid EML using only a 'raw' DDX and XSLT. In most cases we will need some Java code to read values from the dataset and/or additional metadata inserted into the DDX using NcML. However, for dataset that conform to the Climate Forecast 1.0 metadata convention (CF-1.0), we can build EML directly from the 'raw' DDX response (without using Java to read data values or NcML to insert additional metadata).
+
+ Here is an [EML document | http://reap.ecoinformatics.org/attach?page=Ocean_use_case_searching%2F20040103-NCDC-L4LRblend-GLOB-v01-fv01_0-AVHRR_AMSR_OI.nc.bz2.eml] built using XSLT from a DDX from a dataset with CF-1.0 metadata.
+
+ About the EML: The EML generated assumes:
+ * Only Grids are interesting variables in a given dataset/granule
+ * The dataset complies with CF-1.0
+ * Provides dataset-scope geo-temporal metadata
+ * Every Grid shares that dataset-scoped geo-temporal metadata
+
Lines 7-9 were replaced by line 29
- !!Summary
-
- Finding data scattered among distributed servers that are run independently is a long-standing problem. Various solutions like crawling the servers' contents, requiring standardized information about the contents so the can be indexed, et c., have been tried with varying success. The design here addresses the twin problems of motivating providers to write extra information that will make the data locatable and then increasing the time those documents remain valid. It is not a complete solution in the sense that not all providers will write the needed documents and, over time, some documents will go stale. However, it should have fewer problems than designs that fail to consider these realities.
+ Note that this does not provide and hints on how to build a user interface that would actually use the information, only how to get that information into the data base.
Line 13 was replaced by lines 33-35
- DAP servers, which provide all of the data used by the Ocean Use Case are completely autonomous entities with their own network interfaces. Each supports a uniform base set of operations, but there is no requirement to organize or catalog those servers in a coordinated and centralized way. For the purposes of this use case, we assume that implementing a distributed query is outside the scope of solutions we wan to consider. Thus, we must create a centralized store of information about the collection of known data sets that can be searched in response to various types of queries.
+ Finding data scattered among distributed servers that are run independently is a long-standing problem. Various solutions like crawling the servers' contents, requiring standardized information about the contents so the can be indexed, et c., have been tried with varying success. The design here addresses the twin problems of motivating providers to write extra information that will make the data locatable and then increasing the time those documents remain valid. It is not a complete solution in the sense that not all providers will write the needed documents and, over time, some documents will go stale. However, it should have fewer problems than designs that fail to consider these realities.
+
+ DAP servers, which provide all of the data used by the Ocean Use Case are completely autonomous entities with their own network interfaces. Each supports a uniform base set of operations, but there is no requirement to organize or catalog those servers in a coordinated, unified or centralized way. For the purposes of this use case, we assume that implementing a distributed query is outside the scope of solutions we can to consider. Thus, we must create a centralized store of information about the collection of known data sets that can be searched in response to various types of queries.
Line 17 was replaced by line 39
- The Kepler client supports searching the Metacat data base system to locate data sets and it seems that populating an instance of this data base with information about the data sets in the use case will provide the needed information on which to build a search user interface. However, the metacat/kepler combination is not completely general; while metacat is capable of indexing arbitrary XML documents (so it could index the DDX returned by a DAP server for a given data set), kepler expects the records it searches to be (a subset of) EML documents.
+ The Kepler client supports searching the Metacat data base system to locate data sets and it seems that populating an instance of this data base with information about the data sets in the use case will provide the needed information on which to build a search user interface. However, the Metacat/Kepler combination is not completely general; while Metacat is capable of indexing arbitrary XML documents (so it could index the DDX returned by a DAP server for a given data set), Kepler expects the records it searches to be (a subset of) EML documents.
At line 25 added 15 lines.
+
+ !!Risks
+
+ * There's no discussion of how to modify Kepler so users can search for the data sets. This really isn't a risk per se, it just indicates that there's really a whole other issue to be addressed.
+
+ !!Search Parameters
+
+ This search sub-system is driven by the Ocean Use case, in it's second form, and so the parameters are derived from that. However, these are general search parameters applicable to just about any search involving satellite data. The parameters are:
+
+ * Space (Latitude and Longitude)
+ * Time (Really date & time)
+ ** Question: But we should include things like 'day' or night' to accommodate finding only images from the day or night?
+ * Resolution (i.e., what is the area in km^2 covered by a pixel)
+ * Parameter (Sea Surface Temperature, Wind vectors, etc.)
+

Back to Ocean_use_case_searching, or to the Page History.