skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Middleware Case Study: MeDICi

Abstract

In many application domains in science and engineering, data produced by sensors, instruments and networks is naturally processed by software applications structured as a pipeline . Pipelines comprise a sequence of software components that progressively process discrete units of data to produce a desired outcome. For example, in a Web crawler that is extracting semantics from text on Web sites, the first stage in the pipeline might be to remove all HTML tags to leave only the raw text of the document. The second step may parse the raw text to break it down into its constituent grammatical parts, such as nouns, verbs and so on. Subsequent steps may look for names of people or places, interesting events or times so documents can be sequenced on a time line. Each of these steps can be written as a specialized program that works in isolation with other steps in the pipeline. In many applications, simple linear software pipelines are sufficient. However, more complex applications require topologies that contain forks and joins, creating pipelines comprising branches where parallel execution is desirable. It is also increasingly common for pipelines to process very large files or high volume data streams which impose end-to-end performancemore » constraints. Additionally, processes in a pipeline may have specific execution requirements and hence need to be distributed as services across a heterogeneous computing and data management infrastructure. From a software engineering perspective, these more complex pipelines become problematic to implement. While simple linear pipelines can be built using minimal infrastructure such as scripting languages, complex topologies and large, high volume data processing requires suitable abstractions, run-time infrastructures and development tools to construct pipelines with the desired qualities-of-service and flexibility to evolve to handle new requirements. The above summarizes the reasons we created the MeDICi Integration Framework (MIF) that is designed for creating high-performance, scalable and modifiable software pipelines. MIF exploits a low friction, robust, open source middleware platform and extends it with component and service-based programmatic interfaces that make implementing complex pipelines simple. The MIF run-time automatically handles queues between pipeline elements in order to handle request bursts, and automatically executes multiple instances of pipeline elements to increase pipeline throughput. Distributed pipeline elements are supported using a range of configurable communications protocols, and the MIF interfaces provide efficient mechanisms for moving data directly between two distributed pipeline elements.« less

Authors:
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1040989
Report Number(s):
PNNL-SA-82585
TRN: US201211%%319
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Book
Resource Relation:
Related Information: Essential Software Architecture, 2nd Edition, 147-164
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; COMMUNICATIONS; COMPUTERS; COMPUTER CODES; DATA PROCESSING; FLEXIBILITY; FRICTION; MANAGEMENT; PERFORMANCE; PIPELINES; QUEUES; SENSORS; Data Intensive Computing

Citation Formats

Wynne, Adam S. Middleware Case Study: MeDICi. United States: N. p., 2011. Web.
Wynne, Adam S. Middleware Case Study: MeDICi. United States.
Wynne, Adam S. 2011. "Middleware Case Study: MeDICi". United States.
@article{osti_1040989,
title = {Middleware Case Study: MeDICi},
author = {Wynne, Adam S},
abstractNote = {In many application domains in science and engineering, data produced by sensors, instruments and networks is naturally processed by software applications structured as a pipeline . Pipelines comprise a sequence of software components that progressively process discrete units of data to produce a desired outcome. For example, in a Web crawler that is extracting semantics from text on Web sites, the first stage in the pipeline might be to remove all HTML tags to leave only the raw text of the document. The second step may parse the raw text to break it down into its constituent grammatical parts, such as nouns, verbs and so on. Subsequent steps may look for names of people or places, interesting events or times so documents can be sequenced on a time line. Each of these steps can be written as a specialized program that works in isolation with other steps in the pipeline. In many applications, simple linear software pipelines are sufficient. However, more complex applications require topologies that contain forks and joins, creating pipelines comprising branches where parallel execution is desirable. It is also increasingly common for pipelines to process very large files or high volume data streams which impose end-to-end performance constraints. Additionally, processes in a pipeline may have specific execution requirements and hence need to be distributed as services across a heterogeneous computing and data management infrastructure. From a software engineering perspective, these more complex pipelines become problematic to implement. While simple linear pipelines can be built using minimal infrastructure such as scripting languages, complex topologies and large, high volume data processing requires suitable abstractions, run-time infrastructures and development tools to construct pipelines with the desired qualities-of-service and flexibility to evolve to handle new requirements. The above summarizes the reasons we created the MeDICi Integration Framework (MIF) that is designed for creating high-performance, scalable and modifiable software pipelines. MIF exploits a low friction, robust, open source middleware platform and extends it with component and service-based programmatic interfaces that make implementing complex pipelines simple. The MIF run-time automatically handles queues between pipeline elements in order to handle request bursts, and automatically executes multiple instances of pipeline elements to increase pipeline throughput. Distributed pipeline elements are supported using a range of configurable communications protocols, and the MIF interfaces provide efficient mechanisms for moving data directly between two distributed pipeline elements.},
doi = {},
url = {https://www.osti.gov/biblio/1040989}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu May 05 00:00:00 EDT 2011},
month = {Thu May 05 00:00:00 EDT 2011}
}

Book:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this book.

Save / Share: