High Performance Concurrent Multi-Path Communication for MPI

Introduction:

Today, High Performance Computing (HPC) means parallel computing. Massively Parallel Processors (MPPs) have large number of processors that are tightly coupled via a high-bandwidth, low-latency network. Efficient operation of such a distributed system requires that the characteristics of communication and synchronization across WAN (Wide Area Network) connections be addressed appropriately. The Message Passing Interface (MPI) standard is the most generally accepted API for such heterogeneous systems. Open MPI is an open source implementation of MPI standard which introduces a suitable platform to particular problems in message passing. MPI  in  WAN  is  important  because  large  cross-site  resource  pools  often  are used in Cloud and grid computing. The main advantage of using an IP-based protocol (i.e., TCP/SCTP) for MPI is portability and ease with which it can be used to execute MPI programs in heterogeneous environments. The current use of MPI in WAN environments raises problems, e.g., large latencies and the difficulty to utilize the full available bandwidth. Scalability is another significant problem.
The contribution of our work is to improve performance and scalability of communication in WAN MPI for HPC applications. 
This is to be achieved by providing a wide area MPI which is based on development of an optimized Rendezvous communication protocol through suitable protocol stack.

Our recent achievement and future work:

In our recent work [1], we refined Open MPI communication scheme by using an optimized Rendezvous protocol and deployed it in to the IP-based Open MPI middleware to enhance scalability and also to avoid long latency especially for HPC applications with large message size. By doing this, proposed scheme protocol allows us to select suitable protocol for communication either Eager or Rendezvous, according to different message sizes. Our technique reduces unnecessary communications and synchronizations between processors with optimized communication-computation overlap. Our results emphasize the validity and effectiveness of our technique in which significant improvement over traditional Open MPI has been achieved. 

Our investigations into this area are still ongoing to confirm our hypothesis. We expect to get an outcome which will be a point-to-point multi-path communication module in Open MPI with ability to stripe and share transferred data across multiple available interfaces.

Developing an automatic performance optimization tool at runtime that takes into account  the  current  load  of  processors,  links  and  other  criteria  in  order  to  improve bandwidth, scalability, and especially reducing overall communication delay for MPI in wide area network can be considered as a future work.

References:

[1] Rashid Hassani, Ganesh Chavan, Peter Luksch, ‚ÄúOptimization of Communication in MPI-Based Clusters‚ÄĚ, In proceedings of the CyberC 2014, ISBN: 978-1-4799-6235-8, pp. 143-149, DOI:10.1109/CyberC.2014.33, Oct 2014.

Contact for more details:

Rashid Hassani
E-Mail: rashid.hassani(at)uni-rostock.de