(1) PAPER TITLES [1] An Evolutionary Approach to Collective Communication Scheduling (in press, GECCO 07) [2] Optimum Topology-Aware Scheduling of Many-to-Many Collective Communications [3] Complexity of Collective Communications on NoCs (2) AUTHORS Jiri Jaros Faculty of Information Technology Brno University of Technology Bozetechova 2 612 66, Brno, CZ jarosjir@fit.vutbr.cz +420 54114-1207 Milos Ohlidal Faculty of Information Technology Brno University of Technology Bozetechova 2 612 66, Brno, CZ ohlidal@fit.vutbr.cz +420 54114-1264 Vaclav Dvorak Brno University of Technology Bozetechova 2 612 66, Brno, CZ dvorak@fit.vutbr.cz +420 54114-1149 (3) CORRESPONDING AUTHOR Jiri Jaros (4) ABSTRACTS Paper [1] : An Evolutionary Approach to Collective Communication Scheduling In this paper, we describe two evolutionary algorithms aimed at scheduling collective communications on interconnection networks of parallel computers. To avoid contention for links and associated delays, collective communications proceed in synchronized steps. Minimum number of steps is sought for the given network topology, wormhole (pipelined) switching, minimum routing and given sets of sender and/or receiver nodes. Used algorithms are able not only re-invent optimum schedules for known symmetric topologies like hyper-cubes, but they can find schedules even for any asymmetric or irregular topologies in case of general many-to-many collective communications. In most cases does the number of steps reach the theoretical lower bound for the given type of collective communication; if it does not, non-minimum routing can provide further improvement. Optimum schedules may serve for writing high-performance communication routines for application-specific networks on chip or for development of communication libraries in case of general-purpose interconnection networks. Paper [2] : Optimum Topology-Aware Scheduling of Many-to-Many Collective Communications The paper addresses general many-to-many collective communications, whose scheduling may be needed when writing application-specific communication routines or communication libraries. Optimum schedules with the number of steps equal or close to theoretical lower bounds are designed with the use of evolutionary algorithms. Optimization is carried out for a given topology of a direct interconnection network; network nodes can be single or multiple processors connected to a router. Wormhole switching, full duplex links and single-port non-combining nodes are assumed. The developed scheduling could be advantageous mainly for networks on chip (NoC) and application-specific communication architectures. Paper [3] : Complexity of Collective Communications on NoCs The paper addresses the important issue related to communication performance of Networks on Chip (NoCs), namely the complexity of collective communications measured by a required number of algorithmic steps. Three NoC topologies are investigated, a ring network, Octagon and 2D-mesh, due to their easy manufacturability on a chip. The lower complexity bounds are compared to real values obtained by evolution-based optimizing tools. Results give hints on what communication overhead is to be expected in ring- and mesh-based NoCs with the wormhole switching, full duplex links and k-port non-combining nodes. (5) CRITERIA (B), (D), (E) 6) STATEMENT ON WHY THESE CRITERIA ARE SATISFIED The importance of communication among CPU cores, processors and computers and of related interconnection networks is recently steadily growing. More often than not, processing nodes access the network according to a global, structured communication pattern. The performance of these collective communications (CC for short) has a dramatic impact on the overall processing efficiency, because communication times, software- or hardware-related alike, add up to an overhead of parallel processing. Some researchers have taken a topology independent approach to the design of deadlock-free wormhole (WH - distance insensitive) routed CC algorithms. E.g. the postal model (similar to sending a batch of letters through the postal service at one time) demonstrates the ability for a sending node to transmit multiple messages before the receiving node receives the first message. In our work, we have taken rather a topology-aware approach and have assumed that CC in WH networks proceeds in synchronized steps. In one step of CC, a set of simultaneous message transfers takes place along complete disjoint paths between source-destination node pairs. Presently, many different interconnection network topologies for general purpose multiprocessors are in use, but new networks, especially Networks on Chip and application-specific networks, are being developed. Whereas the lower bounds on the time complexity of various group communications (in terms of required number of communication steps) can be mathematically derived for any regular network topology and the given communication pattern, finding a corresponding optimum schedule of communication is difficult and in many cases it is not known as yet. (Octagon topology, Coated Mesh, High dimensional hyper-cubes (8 and more), rectangular meshes, etc.) --(B)-- Our algorithms not only reinvent optimal communication schedules for smaller topologies, which can be derived analytically (e.g. Double-tree algorithm that is optimal for hyper-cubes up to 6 dimensions, dimension ordered routing for one port meshes, Ho-kao algorithm optimal for 7-dimensional hypercube), but also a lot of unknown optimal schedules were discovered for many regular topologies (e.g. Slim/Fat Octagon topology, Moore graph, K-Ring, Midimew) as well as for irregular network topologies (AMP, meshes, ladders, and regular topology with faulty links) and indirect network topologies. --(D)-- Several optimal and sub-optimal communication schedules have been published in international conferences in the area of parallel and distributed computing with high rating. Any shortening of communication schedule (and thus lowering the communication overhead) is very desirable in real applications (multi-cores processors, application specific system on chips, etc.). Optimum schedules may serve as a template for writing high-performance communication routines for application-specific networks on chip or for development of communication libraries in case of general-purpose interconnection networks. --(E)-- Search for deadlock- and congestion-free communication algorithms provided a succession of increasingly better human-created solutions in the area of the computer clusters, embedded parallel systems and also in multi-core chips (AMD Hypertransport, Intel Tera-scale architecture, Cell Broadband Engine Element Interconnect Bus). Our results are either equal to the best known results so far or brand new in case of topologies that do not lend themselves to mathematical treatment. (7) CITATIONS [1] Jiří Jaroš, Miloš Ohlídal, Václav Dvořák: An Evolutionary Approach to Collective Communication Scheduling (accepted full paper, GECCO 2007) [2] Dvořák Václav, Jaroš Jiří, Ohlídal Miloš: Optimum Topology-Aware Scheduling of Many-to-Many Collective Communications, In: Proceedings of The Sixth International Conference on Networking, New York, US, IEEE CS, 2007, p. 6, ISBN 0-7695-2805-8 [3] Jaroš Jiří, Ohlídal Miloš, Dvořák Václav: Complexity of Collective Communications on NoCs, In: Proc. of 5th International Symposium on Parallel Computing in Electrical Engineering, Los Alamitos, CA 90720-1314, US, IEEE CS, 2006, p. 127-132, ISBN 0-7695-2554-7 (8) STATEMENT ON PRIZE MONEY Any prize money, if any, is to be divided equally among the co-authors (9) STATEMENT ON WHY THIS ENTRY IS THE BEST Because communication and especially collective communication is of ever increasing importance in modern multiprocessor and multi-core computing, optimization of communication schedules impacts on efficiency of parallel processing on many computing platforms, from Systems on Chip up to general purpose parallel and distributed computers. Suboptimal communication schedules can be obtained by MILP method (Mixed Integer Linear Programming), but very long solutions are required for network size of practical interest. The communication scheduling can also be formulated as a graph coloring problem. MILP as well as exact or heuristic graph coloring yield only a suboptimal solution. The reason is the existence of multiple minimum paths for some source-destination pairs; it is not clear which minimum path should be selected for pair-wise communication. Another approach, recursive division of set disjoint pair-wise communication described in literature, is supposed to be exact, but suffers the following restrictions: - only non-overlapping sets of processes (transmitters and receivers) are assumed, - routing from source to destination is unique and prescribed, - one-port model is assumed (only one processor's output link communicating at a time). In our approach we are able to relax all above restrictions. Our algorithms are able to not only reinvent the best known communication schedules like Ho-kao or Double-tree broadcast algorithms for hypercubes, Time-Arc disjoint trees for all-to-all broadcast algorithms, etc., but also discover new optimal schedules for topologies where others algorithms fail. The proposed evolutionary algorithms are also first general algorithms capable to solve an arbitrary topology for an arbitrary collective communication including All-to-All and M-to-N (Many-to-Many) CC even with overlapping sets of transmitting and receiving nodes.