Distributed Archives Interoperability
Cynthia Y. Cheung
NASA Goddard Space Flight Center
IAU 2000, Commission 5
Manchester, UK, August 12, 2000
This page contains a text-only overview of Cynthia's talk, and excludes graphics etc. Click
here to see the original slides, including graphics.
ˆà
Current Status
Global Astrophysics Data Resources Loosely Connected By the Internet Observational data archives or repositories Derived data products (astronomical catalogs, browse images, video) Data analysis packages Visualization/presentation packages Special services (bibliography, discipline-specific knowledge bases, directories) Distributed Storage, Processing, and Management Multispectral surveys (Data volume ~ terabytes) Islands of Information? Requires both Vertical and Horizontal Integration
Path to the Future
Current (connections via hyperlinks): one to one Near Future (connections to multiple DBs all at once, via middleware): one to many Long Term (multiple inter-connectivity, federated databases): many to many Distributed Autonomous data centers Intelligent Agents User-defined Profiles and _Preferences Access via Multiple Interfaces Components of Interoperability
Integrated search and discovery URL Registry (e.g., Yellow Pages, GLU, AstroBrowse) Query processor (e.g., AMASE, ISAIA) Browsing/visualization to support selection (ADC Data Viewer, AEQ) Batch queries (Feed output stream of one data service to another) Tools to support integration of results Data and software exchange FTP of data and software updates (pull) Download of Browser Plug-in (pull) Automated Updates (HST DB replication) (push) Hybrid Techniques (with data cache or aircache) (push & pull) Packaging of software with data (XDF) Technical Issues and Challenges
Example: Positional correlation of objects in a region of the sky across multiple wavelengths (Radio, IR, Optical, UV, X-rays, Gamma Rays)
- Data volume and network bandwidth
- Cache of pre-computed results (e.g., astronomical catalogs)
- Data filtering at data site, ship results only
- Deployment of user code (platform independent S/W)
- Data visualization for exploration and selection
- Registration, Sensitivity, Positional Accuracy
- Coordinate transformation on a large scale
- Calibration and normalization
- Query Optimization across Multiple Sites
- Query execution plan for efficient cross-correlation
- Indexing for fast access
Semantic Interoperability
Content-based Searches Science goal driven queries instead of SQL Data Understanding (Domain Context) Human Interface òÀÔ> S/W Mapping òÀÔ> Object-oriented Mapping Data Annotation for Correct Interpretation Measured parameters, units, quality, range of validity Algorithm and calibration used, pedigree Theoretical models applied Data Organization File directory structure Database schema Need Information in both Machine-understandable and Human-understandable form
Metadata Standards
Syntax Directory Structure Size, Format, Location, URL Semantics Usage Convention (e.g., FITS) Extensible Standards to Encompass Different Disciplines (DTD, XML) Astronomical Nomenclature and Designation Conceptual Data Model Metadata Language or Representation FITS, ASCII, IEEE Binary Astronomical XML
Aspects of Metadata Usage[Ref: Bretherton & Singley 1994 Proc of 7th SSDBM, p. 166]
Search, browse, retrieval (Human) Data extraction and interpretation Navigate among services Ingest, quality assurance, (re-)processing Science product generation pipeline Content analysis Storage, archive (Data Management) Information relevant for effective system design and operation Application to application transfer (Machine) Enable "context" interchange (distributed queries and transformations) Need transfer language with mappings from conceptual level to different logical representation
Other Supporting Tools
Interface Standards for Software Tools Tools for Schema Mapping Document logical structure of database (key elements and relationship) Mapping of local definitions into common terminology Track changes and updates at other sites Tools for Data Integration and Fusion Dynamic Interface with user preferences Intelligent Software Agents to mediate interaction Goal:
Global query to many distributed autonomous evolving data resources