Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)
Format: PDF / Kindle (mobi) / ePub
There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.
algorithm requires communication between a pair of processors. Assume that the processors need to communicate with each other m times to complete the algorithm. 1.16. Consider an SPA with the following specifications: Number of serial tasks per stage Number of serial tasks per stage Number of stages Ns Np n Now assume that we have a single processor that requires τ to complete a task and it consumes W watts while in operation. We are also given N = Np parallel but very slow processors. Each
specific block in the memory. The remaining least significant 4 bits specify a word in a given block. Now assume we have a cache memory that can accommodate 128 blocks. In that case, 7 bits are needed to specify the location of a line in the cache. Now we need a mapping function that picks a block from the memory and places it at some location in the cache. There are three mapping function choices: 1. Direct mapping 2. Associative mapping (also known as fully associative mapping) 3.
access the shared medium at any given time so as to prevent bus access collisions. Each module connected to the bus is characterized by its own unique MAC address for identification. The source processor communicates with another processor or memory module by specifying the destination MAC address. Some form of MAC arbitration scheme must be enforced to prevent bus access collisions. There are many arbitration schemes that affect the overall performance of the system . The performance of the
character to accommodate the clauses for the parallel directive. Line 1 identifies to the compiler a parallel region of code as delineated by the curly brackets starting at line 3 and ending at line 14. The default (none) clause defines the default data scope of variables in each thread. There are two options with this clause: none and shared. In line 2, the shared (x, m, n) clause declares the scope of the commaseparated data variables in the list to be shared across all threads. The private (i,
either in hardware or in software. This is referred to as parallelizing an algorithm. The parallelization strategy depends on the type of algorithm we are dealing with. 12 Chapter 1 Introduction Serial Algorithms Serial algorithms, as exemplified by Fig. 1.3a, cannot be parallelized since the tasks must be executed sequentially. The only parallelization possible is when each task is broken down into parallelizable subtasks. An example is to perform bit-parallel add/multiply operations.