A Framework for Model-Driven Scientific Workflow Engineering
- Zugl., Kiel, Univ., Diss. 2013
|Seitenbereich||xxii, 298 S.
So-called scientific workflows are one important means in the context of data-intensive science for reliable and efficient scientific data processing in distributed computing infrastructures such as Grids. Scientific Workflow Management Systems (SWfMS) help scientists model and run scientific workflows, whereas a domain-specific layer for workflow modeling by a scientist and a technical layer for automated workflow execution can be distinguished.
Initially, many SWfMS were developed from scratch using custom workflow technologies languages without application of already existing and established business workflow technologies. Among the reasons were different life cycles for scientific and business workflows as well as incompatible interfaces and communication protocols of the respective execution infrastructures.
Meanwhile, several business IT infrastructures have evolved to serviceoriented architectures (SOAs), for which many Web service standards and technologies have been developed. The Web Services Business Process Execution Language (BPEL), for example, is a well-accepted standard for the implementation and execution of business workflows in SOAs. The SOA architecture pattern has been adopted in scientific IT infrastructures by so-called Service Grids based on existing standards and technologies. Due to this development, BPEL is also suitable for the execution of scientific workflows at the technical layer, which has been elaborated on in many publications and projects. However, BPEL is a workflow language for IT experts and is originally not suited for scientific workflow modeling by a scientist at the domain-specific layer. A domain-specific abstraction of BPEL is therefore required that can be specifically tailored for scientific workflow modeling as well as a corresponding mapping to the technical layer. These challenges of the domain-specific abstraction and the mapping are addressed in this thesis with the help of the Business Process Model and Notation (BPMN) standard and technologies from Model-Driven Software Development (MDSD). Therefore, the MoDFlow approach for Model-Driven Scientific WorkFlow Engineering is presented to map domain-specific scientific workflow models via a BPMN-based intermediate layer to an executable workflow model. The intermediate layer is specified by MoDFlow.BPMN, which is a BPMN metamodel subset with custom extensions for the scientific domain. MoDFlow.BPMN2BPEL defines three consecutive transformation steps to map MoDFlow.BPMN to BPEL for workflow execution. Furthermore, different methods to utilize and extend MoDFlow.BPMN and MoDFlow.BPMN2BPEL are described in the MoDFlow approach, in which the definition of so-called domain-specific languages (DSLs) for the modeling of scientific workflows at the domain-specific layer is focused. The MoDFlow framework is an implementation of the MoDFlow approach, which is based on the Eclipse Modeling Framework (EMF). The MoDFlow framework is evaluated in three application scenarios, in which different utilization and extension mechanisms are examined. The first two application scenarios investigate the technical feasibility of the approach and support scientific workflows with parameter sweeps that are executed on a Grid infrastructure. The third application scenario has been conducted in collaboration with the PubFlow project, which aims to create an infrastructure to model and execute data publication workflows. Based on the Xtext framework, a textual DSL and a corresponding language infrastructure is defined for this purpose that supports developers in creating data publication workflows. This scenario aims to illustrate the practicability of the MoDFlow framework. PubFlow currently plans to implement an additional graphical DSL based on the BPMN notation and a corresponding workflow editor for scientists.