VHR ·  Prof. Dr. Peter Luksch ·  Distributed High Performance Computing · Computer Science · Faculty IEF · University of Rostock
IEEE Technical Committee on Scalable Computing

Technical Area Software Engineering for Scalable Systems

Introduction

Clusters of workstations and/or PCs have gained increased attention as low cost parallel computing platforms over the past years. High performance CPUs and interconnect technology make clusters an attractive hardware platform for many applications of industrial relevance that have been the domain of supercomputing for a long time. However, to make use of these cost effective platforms, application software has to be parallelized.

Simulation based design is one domain that is of particular interest to industry, especially to small to medium size companies who cannot afford access to traditional supercomputers or MPPs. Since there are a number of mature software packages in Fluid Dynamics, Finite Element Analysis, and Digital Mockup, software projects that aim at exploiting cluster power usually will be parallelizations of existing software. Parallelizing large-scale industrial software packages, however, requires appropriate software engineering methods.

In the past, quite a few projects have parallelized industrial simulation codes. An example is the EUROPORT initiative, where a number of industrial codes have been ported to parallel computers. Only very few software vendors, however, actually offer parallel execution as an option in their products. The goal of TFCC's Technical Area Software Engineering is to overcome the obstacles that prevent parallel software to come into widespread use on clusters. In particular, TFCC-SWE seeks to promote research and facilitate exchange of experience related to the issues listed below.

Research Issues

Programming Models and Environments

PVM and MPI currently are the standards in distributed memory computing. However, Symmetric Multiprocessors have become increasingly popular. Actually, most high-end workstations and PCs are SMPs with two to eight processors. Thus we are faced with two levels of parallelism in future clusters. OpenMP and POSIX threads are two standards for programming SMPs. Future research will have to adress hierarchical parallelism. In distributed Computing, CORBA has established as a standard that allows one to interface software packages in a heterogeneous environment. In particular, legacy applications can be integrated as components into new distributed environments. In order to promote exchange of experience, we provide a

Software Engineering Methods for Porting existing Software to Clusters

In scientific computing, most parallel software projects are parallelizations of existing software. Due to the limitations of automatic parallelization and data parallel languages, many software packages have to be parallelized manually. This process usually requires interdisciplinary cooperation. To achieve high efficiency and scalability, a well-defined software engineering process is required that is consequently applied throughout the project. Using standardized programming model, all parallel software is developed for a whole range of platforms. However, achieving good efficiency on clusters is a particular challenge for a number of reasons. Interconnection networks (WANs, Ethernet based LANs) exhibit much higher latency than MPP interconnects. A remarkable exception are SCI based clusters. Clusters are usually heterogeneous, i.e., nodes have different types (and numbers) of CPUs, different CPU power and memory capacity. In addition, multiple users compete for resources. This makes resource management and dynamic load balancing a pariticularly important issue. Distributed object oriented computing based on standards like CORBA makes exisiting applications (including legacy systems) interoperate in a cluster environment.

Program Development Tools

Appropriate tools for program development and analysis are a key prerequisite to productivity in engineering parallel and distributed software. EuroTools and PTools a two consortiums that coordinate and promote research in tools for parallel programming. They provide information on available tools and ongoing research projects. At LRR-TUM, a standard for on-line monitoring, OMIS, has been defined that provides a basis for an integrated tool environment. A reference implementation, OCM, is currently being implemented. On top of OCM, the parallel debugger DETOP will be made available on a series of common parallel platforms as the first part of the integrated tool environment THE TOOL-SET.
Program analysis tools are a Technical Aera of its own in the TFCC.

Contact: Peter Luksch


Navigate: IEEE Technical Committee on Scalable Computing
Peter Luksch
$Id: index.html,v 1.2 2005/11/14 11:54:49 pl020 Exp $