NSF EarthCube DAsHER CA:

Developing a Data-Oriented Human-Centric Enterprise Architecture for EarthCube

Vast geoscience resources (e.g., data, tools, facilities, experts) have been accumulated in the past decades through support from NSF and other agencies. However, it is still a challenge for geoscientists to discover, select, and access resources to accelerate their research. EarthCube is the NSF's strategic effort (NSF, 2011) to address the challenge by building a cyberinfrastructure (CI) to enable the integration, communication, and collaboration among geoscientists from different domains including the solid Earth, atmosphere, oceans, poles, and computational sciences. Three Conceptual Design (CD) projects were selected to investigate frameworks for how an EarthCube system might function and be integrated. This is the DAsHER project (dasher.cloud.gmu.edu). With a three-year development, the EarthCube DAsHER Conceptual Architecture (CA) was designed to support EarthCube to facilitate communication and collaboration in pursuit of collaboration within the geosciences.

This EarthCube CA is developed by leveraging several popular frameworks (i.e., Zachman, Gartner, TOGAF, FEAF, and DoDAF) that have been demonstrated to be efficient in practice for several geoscience projects within the NSF's Spatiotemporal Innovation Center. The EarthCube CA comprises three major components: 1) overview of current geoscience resources and architectures; 2) capabilities of the envisioned EarthCube; and 3) alignment of the proposed CA with the geoscience's community.

This project CA design follows a spiral developments approach (Figure 1) : 1) spiral one analyzed geoscientist workshop reports, and roadmaps for EarthCube architecture requirements ;2) spiral two designed the overall architecture and overview; 3) spiral three designed the detailed conceptual architecture and solicited community feedback; 4) spiral four completed a draft conceptual architecture with community feedback addressed and integrated.

Figure 1. Spiral developments approach

The final design is a four-volume report with different views of the EarthCube conceptual architecture from multiple perspectives (Figure 2). Volume one is an overview of EarthCube and architecture requirements for the initial development. Volume two is the presentation of the architectural design components and viewpoints. Volume three is the dictionary that presents the taxonomies used in this EarthCube CA. Finally, Volume four describes an example of how this conceptual design architecture was used to develop a Polar cyberinfrastructure portal (The design documents are available for editing on our wiki website - http://dasher.cloud.gmu.edu/wiki/).

Figure 2. A final design of four volumes

Next Steps: The report could be used as a conceptual framework for building EarthCube and other geoscience cyberinfrastructure. The design can be adopted as a comprehensive guide for developing a geoscience cyberinfrastructure for multiple stakeholders in the EarthCube community and as a source for priorities to identify topics of interest and priorities in the geosciences.

The DAsHER design is a four-volume report with different views of the EarthCube conceptual architecture from multiple perspectives.

Volume I is an overview of EarthCube and architecture requirements for the initial development. It summarizes the functionalities of the EarthCube Enterprise as a geoscience research engine, a geoscience resource management platform, a geoscience computer service provider, and an interoperability platform for cross-domain studies. The key components of this envisioned EarthCube Enterprise include data sharing, data curation, communication and collaboration infrastructure, and governance. The design principles of this EarthCube Enterprise include interoperability, long-term sustainability, leveraging existing cyberinfrastructure assets, maximizing linkage, and system of systems. Volume I also elaborates the Enterprise Architecture models referenced (i.e., Zachman, Gartner, TOGAF, FEAF, and DoDAF), and DAsHER's strategy on combining the strength of FEAF and DoDAF as well as referencing elements from other relevant architectures.

Volume II is the presentation of the architectural design components and viewpoints, including Capability Viewpoints (CV), Data and Information Viewpoints (DIV), Operational Viewpoints (OV), Project Viewpoints (PV), Service Viewpoints (SvcV), and Standard Viewpoints (StdV).

  • In CV, capability requirements are analyzed based on the EarthCube group roadmaps and workshop reports. In a higher level, the capabilities are divided into three categories: end-user capabilities, enabling capabilities and resource capabilities. According to the differences in practical operation, EarthCube capabilities are further divided into multiple sub-capabilities which have unique process in the operation (Capabilities to Operational Activities Mapping). Moreover, service is a mechanism/performer to implement or enable a set of one or more capabilities. Therefore, the relationship among capabilities and the corresponding servicers are also described (Capability to Services Mapping).
  • In DIV, the current challenges in Earth science studies related to data and information are described and categorized into several levels according to their concerning points: the data source level, data processing level, and system level, each corresponding to the problems in areas of data capturing, data handling, and structures of data accessing and management. In order to address these above challenges, the requirements and tasks of the envisioned EarthCube Enterprise are summarized, including facilitating the development and integration of different tools, repositories and databases. As a system of system, EarthCube will not only serve as a repository or fixed set of standards, but will also allow each community to decide how data is exposed, and provide the inventory of different types of resources, and benefit from different supporting roles of EarthCube. In addition, current efforts/projects are described to support the data and information capabilities.
  • In OV, a series of operational activities are presented, including data archiving, resource discovery, resource access, data analysis, publishing, provenance, model development, model assessment, model coupling, quality control, security control, governance, communication, and education. Operational processes are illustrated in workflow graphics, and rules and requirements for these operational activities are specified. Different supporting roles and their responsibilities are considered individually to enable the interconnected but clearly defined workflow (Figure 3).

    Figure 3. High-level operational concept

  • PV describe how EarthCube funded projects deliver capabilities and the organizations contributing to them and the dependencies between them. Two viewpoints are included in this session, project portfolio relationships and project to capability and service mapping. These sections can be used to answer questions such as: 1) what capabilities are delivered as part of this project, 2) are there other projects that either affect or are affected by this project, 3) what are the interdependencies and relationships between projects, and others?.
  • SvcV describes the envisioned EarthCube services and their interconnections providing or supporting EarthCube functions. In consistency with capability classifications, services are grouped into three classes: end-user capability, enabling technologies, and cross cutting. End-user capability group includes services that can benefit end-users in their research or development, including data discovery, access, modeling services, and others. Enabling technology group includes services that can facilitate the utilization of computation, semantics, brokering and other techniques. Enabling technology services not only support end-user services but also benefit the operation of the EarthCube system. Cross-cutting group includes services that provides the constraints to other service, such as quality related services, and provenance related services, or provide overall control of the operation, such as security, communication, and governance services. Several services can be grouped into both end-user capability group or enabling technology group, such as workflow, visualization, and processing services.
  • StdV is the set of rules governing the arrangement, interaction, and interdependence of parts or elements of the Architectural Description. The purpose of these rules is to ensure that a system satisfies a specified set of operational requirements. This standard profile collects the various system standards rules that implement and may sometimes constrain the choices that can be made in the design and implementation of architecture

Volume III is the dictionary that presents the taxonomies used in this EarthCube CA. It defines 63 terms that are used in DAsHER conceptual design, and have specific meanings which might be different from the other usage within the community. Besides the definition, this integrated dictionary provides examples in DAsHER of how these terms are used specifically, providing a context for readers to have a better understanding of the definition of the terms. The definitions of terms are determined arbitrarily. Multiple sources are referred, including DODAF, TOGAF, Webster's, Zachman, EC glossary, and CINERGI vocabulary. The dictionary graph illustrates the inter-relationship between terms, and is drawn according to the structure of entity described in Volume I, and the capability graph illustrated in Volume II.

Finally, Volume IV describes an example of how this conceptual design architecture can be used to develop a Polar cyberinfrastructure portal to achieve data discovery and access capabilities. The design and development process of the Polar CI is described. The requirements of a polar CI is drawn from the capabilities and services from Volume II. According to the requirements, a set of capabilities are developed, including resource discovery, data archiving and publishing, data analysis, quality, and semantics. These capabilities are achieved by the corresponding components: Search broker and Data harvesting middleware, Data warehouse, Visualization tool, QoS (Quality of Service) engine, and Semantic engine. These components can further provide the following services: Distributed search service, Visualization service, Catalog service, Data Discovery Service, Publishing Service, Ontology Service, Quality Assessment/Validation service, and Knowledge Reasoning service. This use case serves as an approach to utilize and evaluate the conceptual design.

For details and helping improve the design, please check relevant sections from the DAsHER Wiki:
  • PI: Chaowei Yang
  • Co-Is: Erin Robinson, Manzhu Yu, Carol Meyer, Chen Xu
  • Team members: Min Sun, Zhenlong Li, Mengchao Xu, Kai Liu, Yunfeng Jiang, Han Qin & Yongyao Jiang
  • Workshop I: 2014 ESIP summer meeting, Frisco, CO
    1. Erin Robinson, Foundation for Earth Science
    2. Eric Kihn, NOAA/NGDC
    3. Douglas Fils, Consortium for Ocean Leadership
    4. Glenn Rutledge, National Climatic Data Center
    5. Min Sun, George Mason University
    6. Brian Tisdale, Booz Allen Hamilton
    7. Ismael smiley Calderon, Chapman University
    8. Chaowei Yang, GMU/CISC
    9. S Doman Bennett, USGS EROS
    10. Melinda Marquis, NOAA
    11. Antonia Rosati, National Snow and Ice Data Center
    12. Emily Law, JPL
    13. Matthew Austin, NOAA/NESDIS
    14. M. Lee Allison, Arizona Geological Survey
    15. Stephen Richard, Arizona Geological Survey
    16. Christy Caudill Daugherty, Arizona Geological Survey
    17. Anna Katz, Arizona Geological Survey
    18. Aubrey Beach, Booz Allen Hamilton
    19. Vivian Hutchison, US Geological Survey
    20. Sky Bristol, US Geological Survey
    21. Danielle Harper, NSIDC
    22. Matthew Ferritto, RPI
    23. Heather McCullough, NOAA National Geophysical Data Center
    24. Daniel Oostra, MY NASA DATA/NASA
    25. Zhenlong Li, George Mason University
    26. Janet Fredericks, WHOI
    27. David Meyer, US Geological Survey/EROS
    28. Ruth Duerr, NSIDC
    29. James Conners, UCSD
    30. Brian Mapes, RSMAS University of Miami
    31. Robert Arko, Lamont-Doherty Earth Observatory
    32. Nancy Wiegand, UW-Madison

  • Workshop II: 2015 ESIP summer meeting, Monterey Bay, CA
    1. Evan Burgess, University of Utah
    2. Ethan Davis, UCAR / Unidata
    3. Douglas Fils, Consortium for Ocean Leadership
    4. Thomas Huang, NASA/JPL
    5. Chris Jenkins, University of Colorado Boulder
    6. Emily Law, NASA/JPL
    7. Chaowei Yang, GMU/CISC
    8. Manzhu Yu, GMU/CISC
    9. David LeBauer, University of Illinois at Urbana-Champaign
    10. Wenwen Li, Arizona State University
    11. Chunwei Liu, National Space Science Center, Chinese Academy of Sciences
    12. Yan Liu, University of Illinois at Urbana-Champaign
    13. Zhong Liu, NASA/George Mason University
    14. Stephen Richard, Arizona Geological Survey
    15. Brandon Whitehead, University of Auckland
    16. Jianting Zhang, The City College of New York
  • Presentations Made: AGU, AAG, ESIP, etc.