本篇主要提供大规模并行处理器程序设计电子书的pdf版本下载,本电子书下载方式为百度网盘方式,点击以上按钮下单完成后即会通过邮件和网页的方式发货,有问题请联系邮箱ebook666@outlook.com
Wen-Mei W.Hwu是伊利诺伊大学厄巴纳–香槟分校电气与计算机工程系的Sanders-AMD讲席教授。他的研究兴趣是并行计算的体系结构、实现、编译和算法领域。他是并行计算研究中心的首席科学家,IMPACT研究小组的负责人。他是MulticoreWare公司的联合创始人兼CTO。在研究和教学方面,他获得了ACM SigArch Maurice Wilkes奖、ACM Grace Murray Hopper奖、Tau Beta Pi Daniel C.Drucker杰出学者奖、ISCA影响力论文奖、IEEE计算机协会B.R.Rau奖以及加州大学伯克利分校计算机科学杰出校友奖。他是IEEE和ACM的会士。他主持UIUC CUDA 中心的工作,并且是NSF Blue Waters Petascale计算机项目的主要研究人员之一。Hwu博士在加州大学伯克利分校获得计算机科学博士学位。
Preface
Acknowledgements
CHAPTER.1 Introduction
1.1 Heterogeneous Parallel Computing
1.2 Architecture of a Modern GPU
1.3 Why More Speed or Parallelism
1.4 Speeding Up Real Applications
1.5 Challenges in Parallel Programming
1.6 Parallel Programming Languages and Models
1.7 Overarching Goals
1.8 Organization of the Book
References
CHAPTER.2 Data Parallel Computing
2.1 Data Parallelism
2.2 CUDA C Program Structure
2.3 A Vector Addition Kernel
2.4 Device Global Memory and Data Transfer
2.5 Kernel Functions and Threading
2.6 Kernel Launch
2.7 Summary
Function Declarations
Kernel Launch
Built-in (Predefined) Variables
Run-time API
2.8 Exercises
References
CHAPTER.3 Scalable Parallel Execution
3.1 CUDA Thread Organization
3.2 Mapping Threads to Multidimensional Data
3.3 Image Blur: A More Complex Kernel
3.4 Synchronization and Transparent Scalability
3.5 Resource Assignment
3.6 Querying Device Properties
3.7 Thread Scheduling and Latency Tolerance
3.8 Summary
3.9 Exercises
CHAPTER.4 Memory and Data Locality
4.1 Importance of Memory Access Efficiency
4.2 Matrix Multiplication
4.3 CUDA Memory Types
4.4 Tiling for Reduced Memory Traffic
4.5 A Tiled Matrix Multiplication Kernel
4.6 Boundary Checks
4.7 Memory as a Limiting Factor to Parallelism
4.8 Summary
4.9 Exercises
……
CHAPTER 17 Parallel Programming and ComputationalThinking
17.1 Goals of Parallel Computing
17.2 Problem Decomposition