Annals of Multicore and GPU Programming <p><img src="/public/site/images/agsh/banner-ampg4.png" alt="editorial universitaria" width="1200" height="103"></p> <p><strong>Annals of Multicore and GPU Programming (AMGP)</strong></p> <p style="text-align: justify;">Concurrent Programming, as a scientific discipline, has been focused on recent developments to support the high-performance parallelization of multithreaded and multitasked software, derived from the emergence of multicore processors (and also GPUs). Not only in the personal computers field but also in tablets and mobile phones, are these considered to be the reference hardware platforms in the future.</p> <p style="text-align: justify;">The new journal will fill a gap and become a niche in the world of high-impact scientific journals, within the generic field known as Parallel and Distributed Systems on Multicore and GPU Platforms. Moreover, the new journal can provide a basis for the developing sub-discipline of Multicore Programming. This can become an independent discipline with a scientific legacy of its own and be maintained over time.</p> <p style="text-align: justify;">The publication in the AMGP journal is free of charges for the authors. There are no fees for reviewing nor publishing papers in the journal.</p> <p>ISSN:&nbsp;2341-3158</p> <p>Publication schedule: annually.</p> <p>Editors:</p> <ul> <li class="show">Dr. Manuel I. Capel (manuelcapel at</li> <li class="show">Dr. Antonio J. Tomeu (antonio.tomeu at</li> </ul> <p>Technical:</p> <ul> <li class="show">Dr. Alberto G. Salguero (alberto.salguero at</li> </ul> <p>Publisher:</p> <ul> <li class="show"><a href="" target="_blank" rel="noopener">University of Granada</a></li> <li class="show"><a href="" target="_blank" rel="noopener">University of Cadiz</a></li> </ul> <p>Sponsoring:</p> <ul> <li class="show"><a href="" target="_blank" rel="noopener">Thematic Network of Concurrent, Distributed and Parallel Programming (TECDIS)</a></li> </ul> <p>&nbsp;</p> <p><a href="/index.php/amgp"></a></p> <p>ETSI Informática y Telecomunicaciones<br>Periodista Daniel Saucedo Aranda, s/n<br>E- 18071, Granada (Spain)</p> University of Granada-University of Cadiz en-US Annals of Multicore and GPU Programming 2341-3158 Revisiting Strassen’s Matrix Multiplication for Multicore Systems Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × n from the O(n3) cost of a typical three-loop implementation to approximately O(n2.81). The reduction is made at the expense of additional operations of cost O(n2), and additional memory is needed for temporary results of recursive calls. The advantage of Strassen’s algorithm is therefore only apparent for larger matrices and it requires careful implementation. The increase in the speed of computational systems with several cores which share a common memory space also makes it more difficult for the algorithm to compete against highly optimized three-loop multiplications. This paper discusses various aspects which need to be addressed when designing Strassen multiplications for today’s multicore systems. Domingo Giménez Cánovas Copyright (c) 2018-05-24 2018-05-24 4 1 1 8 Implementation of the Pipeline Parallel Programming Technique as an HLPC: Usage, Usefulness and Performance <p>This article presents the pipeline communication/interaction pattern for concurrent, parallel and distributed systems as a high-level parallel composition (HLPC) and discusses its usefulness for deriving parallel versions of sequential algorithms. In particular, we provide examples of the parallel solution for the following problems: adding numbers, sorting numbers and solving a system of linear equations. An approach based on structured parallelism and the parallel object concept is used to solve these problems. In its generic pattern, the pipeline pattern is shown as an HLPC that deploys three types of parallel objects (a manager, various stages and a collector) which are interconnected to form the pipeline processing structure. We also use a method to systematically create the HLPC pipeline and solve this type of problem. Each pipeline instance must be able to handle predefined synchronization restrictions between processes (maximum parallelism, mutual exclusion and synchronization of the producer-consumer type, the use of synchronous, asynchronous and future asynchronous communication, etc.). Finally, the article presents the performance of pipeline HLPC-based implementations of parallel algorithms for solving the problems raised in the paper by using exclusive CPUs.</p> Mario Rossainz-Lopez Manuel I. Capel-Tuñon Odón Carrasco-Limón Fernando Hernández-Polo Bárbara Sánchez-Rinza Copyright (c) 2017-12-24 2017-12-24 4 1 9 22 Assessing Energy Consumption and Runtime Efficiency of Master- Worker Parallel Evolutionary Algorithms in CPU-GPU Systems Thanks to parallel processing, it is possible not only to reduce code runtime but also energy consumption once the workload has been adequately distributed among the available cores. The current availability of heterogeneous architectures including GPU and CPU cores with different power-performance characteristics and mechanisms for dynamic voltage and frequency scaling does, in fact, pose a new challenge for developing efficient parallel codes that take into account both the achieved speedup and the energy consumed. This paper analyses the energy consumption and runtime behavior of a parallel master-worker evolutionary algorithm according to the workload distribution between GPU and CPU cores and their operation frequencies. It also proposes a model that has been fitted using multiple linear regression and which enables a workload distribution that considers both runtime and energy consumption by means of a cost function that suitably weights both objectives. Since many useful bioinformatics and data mining applications are tackled by programs with a similar profile to that of the parallel master-worker procedure considered here, the proposed energy-aware approach could be applied in many different situations. Juan José Escobar Julio Ortega Antonio Díaz Jesús González Miguel Damas Copyright (c) 2018-05-24 2018-05-24 4 1 23 36 A Parallel Model for the Belousov- Zhabotinsky Oscillating Reaction with Python and Java The programing language Python has been rapidly gaining in popularity and it has now become the first choice for implementing all kinds of systems in different software development fields. Programmers now use it for parallel processing on multicore and manycore architectures through specific modules such as Numba, PyCuda or mpi4Py. Much analysis work has been conducted to compare the performance of Python and commonly-used programming languages such as Java. This article presents a further comparison by solving the Belousov-Zhabotinsky oscillating reaction problem with both languages by using symmetrical multiprocessing with data partition. Antonio J. Tomeu Alberto G. Salguero Manuel I. Capel Copyright (c) 2018-05-24 2018-05-24 4 1 37 45