图书介绍
高级计算机体系结构 英文PDF|Epub|txt|kindle电子书版本网盘下载
![高级计算机体系结构 英文](https://www.shukui.net/cover/19/31416073.jpg)
- (美)(黄铠)Kai Hwang著 著
- 出版社: 北京:机械工业出版社
- ISBN:7111067126
- 出版时间:1999
- 标注页数:770页
- 文件大小:36MB
- 文件页数:792页
- 主题词:
PDF下载
下载说明
高级计算机体系结构 英文PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
PART 1 THEORY OF PARALLELISM1
Chapter 1 Parallel Computer Models3
1.1 The State of Computing3
1.1.1 Computer Development Milestones3
1.1.2 Elements of Modern Computers6
1.1.3 Evolution of Computer Architecture9
1.1.4 System Attributes to Pertormance14
Foreword17
Preface19
1.2 Multiprocessors and Multicomputers19
1.2.1 Shared-Memory Multiprocessors19
1.2.2 Distrlbuted-Memory Multicomputers24
1.2.3 A Taxonomy of MIMD Computers27
1.3 Multivector and SIMD Computers27
1.3.1 Vector Supercomputers27
1.3.2 SIMD Supercomputers30
1.4 PRAM and VLSI Models32
1.4.1 Parallel Random-Access Machines33
1.4.2 VLSI Complexity Model38
1.5 Architectural Development Tracks41
1.5.1 Multiple-Processor Tracks41
1.5.2 Multivector and SIMD Tracks43
1.5.3 Multithreaded and Dataflow Tracks44
1.6 Bibliographic Notes and Exercises45
Chapter 2 Program and Network Properties51
2.1 Conditions of Parallelism51
2.1.1 Data and Resource Dependences51
2.1.2 Hardware and Software Parallelism57
2.1.3 The Role of Compilers60
2.2 Program Partitioning and Scheduling61
2.2.1 Grain Sizes and Latency61
2.2.2 Grain Packing and Scheduling64
2.2.3 Static Multiprocessor Scheduling67
2.3 Program Flow Mechanisms70
2.3.1 Control Flow Versus Data Flow71
2.3.2 Demand-Driven Mechanisms74
2.3.3 Comparison of flow Mechanisms75
2.4 System Interconnect Architectures76
2.4.1 Network Properties and Rorting77
2.4.2 Static Connection Networks80
2.4.3 Dynamic Connection Networks89
2.5 Bibliographic Notes and Exercises96
Chapter 3 Principles of Scalable Performance105
3.1 Performance Metrics and Measures105
3.1.1 Parallelism Profile in Programs105
3.1.2 Harmonic Mean Performance108
3.1.3 Efficiency,Utilization,and Quality112
3.1.4 Standard Performance Measures115
3.2.1 Massive Parallelism for Grand Challenges118
3.2 Parallel Processing Applications118
3.2.2 Application Models of Parallel Computers122
3.2.3 Scalability of Parallel Algorithms125
3.3 Speedup Performance Laws129
3.3.1 Amdahl s Law for a Fixed Workload129
3.3.2 Gustafson s Law for Scaled Problems131
3.3.3 Memory-Bounded Speedup Model134
3.4.1 Scalability Metrics and Goals138
3.4 Scalability Analysis and Approaches138
3.4.2 Evolution of Scalable Computers143
3.4.3 Research Issues and Solutions147
3.5 Bibliographic Notes and Exercises149
PART Ⅱ HARDWARE TECHNOLOGIES155
Chapter 4 Processors and Memory Hierarchy157
4.1 Advanced Processor Technology157
4.1.1 Design Space of Processors157
4.1.2 Instruction-Set Architectures162
4.1.3 CISC Scalar Processors165
4.1.4 RISC Scalar Processors169
4.2 Superscalar and Vector Processors177
4.2.1 Superscalar Processors178
4.2.2 The VLIW Architecture182
4.2.3 Vector and Symbolic Processors184
4.3 Memory Hierarchy Technology188
4.3.1 Hierarchical Memory Technology188
4.3.2 Inclusion,Coherence,and Locality190
4.3.3 Memory Capacity Planning194
4.4 Virtual Memory Technology196
4.4.1 Virtual Memory Models196
4.4.2 TLB,Paging,and Segmentation198
4.4.3 Memory Replacement Policies205
4.5 Bibliographic Notes and Exercises208
5.1 Backplane Bus Systems213
5.1.1 Backplane Bus Specification213
Chapter 5 Bus,Cache,and Shared Memory213
5.1.2 Addressing and timing Protocols216
5.1.3 Arbitration,Transaction,and Interrupt218
5.1.4 The IEEE Futurebus+Standards221
5.2 Cache Memory Organizations224
5.2.1 Cache Addressing Models225
5.2.2 Direct Mapping and Associative Caches228
5.2.3 Set-Associative and Sector Caches232
5.2.4 Cache Performance Issues236
5.3 Shared-Memory Organizations238
5.3.1 Interleaved Memory Organization239
5.3.2 Bandwidth and Fault Tolerance242
5.3.3 Memory Allocation Schemes244
5.4 Sequential and Weak Consistency Models248
5.4.1 Atomicity and Event Ordering248
5.4.2 Sequential Consistency Model252
5.4.3 Weak Consistency Models253
5.5 Bibliographic Notes and Exercises256
Chapter 6 Pipelining and Superscalar Techniques265
6.1 Linear Pipeline Processors265
6.1.1 Asynchronous and Synchronous Models265
6.1.2 Clocking and Timing Control267
6.1.3 Speedup,Efficiency,and Throughput268
6.2 Nonlinear Pipeline Processors270
6.2.1 Resservation and Latency Analysis270
6.2.2 Collision-Free Scheduling274
6.2.3 Pipeline Schedule Optimization276
6.3 Instruction Pipeline Design280
6.3.1 Instruction Execution Phases280
6.3.2 Mechanisms for Instruction Pipelining283
6.3.3 Dynamic Instruction Scheduling288
6.3.4 Branch Handling Techniques291
6.4 Arithmetic Pipeline Design297
6.4.1 Computer Arithmetic Principles297
6.4.2 Static Arithmetic Pipelines299
6.4.3 Multifunctional Arithmetic Pipelines307
6.5 Superscalar and Superpipeline Design308
6.5.1 Superscalar Pipeline Design310
6.5.2 Superpipelined Design316
6.5.3 Supersymmetry and Design Tradeoffs320
6.6 Bibliographic Notes and Exercises322
PART Ⅲ PARALLEL AND SCALABLE ARCHITECTURES329
Chapter 7 Multiprocessors and Multicomputers331
7.1 Multiprocessor System Interconnects331
7.1.1 Hierarchical Bus Systems333
7.1.2 Crossbar Switch and Multiport Memory336
7.1.3 Multistage and Combining Networks341
7.2 Cache Coherence and Synchronization Mechanisms348
7.2.1 The Cache Coherence Problem348
7.2.2 Snoopy Bus Protocols351
7.2.3 Directory-Based Protocls358
7.2.4 Hardware Synchronization Mechanisms364
7.3.1 Design Choices in the Past368
7.3 Three Generations of Multicomputers368
7.3.2 Present and Future Development370
7.3.3 The Intel Paragon System372
7.4 Message-passing Mechanisms375
7.4.1 Message-Routing Schemes375
7.4.2 Deadlock and Virtual Channels379
7.4.3 Flow Control Strategies383
7.4.4 Multicast Routing Algorithms387
7.5 Bibliographic Notes and Exercises393
Chapter 8 Multivector and SIMD Computers403
8.1 Vector Processing Principles403
8.1.1 Vector Instruction Types403
8.1.2 Vector-Access Memory Schemes408
8.1.3 Past and Present Supercomputers410
8.2.1 Performance-Directed Design Rules415
8.2 Multivector Multiprocessors415
8.2.2 Cray Y-MP,C-90,and MPP419
8.2.3 Fujitsu VP2000 and VPP500425
8.2.4 Mainframes and Minisupercomputers429
8.3 Compound Vector Processing435
8.3.1 Compound Vector Operations436
8.3.2 Vector Loops and Chaining437
8.3.3 Multipipeline Networking442
8.4 SIMD Computer Organizations447
8.4.1 Implementation Models447
8.4.2 The CM-2 Architecture449
8.4.3 The MasPar MP-1 Architecture453
8.5 The Connection Machine CM-5457
8.5.1 A Synchronized MIMD Machine457
8.5.2 The CM-5 Network Archiecture460
8.5.3 Control Processors and Processing Nodes462
8.5.4 Interprocessor Communications465
8.6 Bibliographic Notes and Exercises468
Chapter 9 Scalable,Multithreaded,and Dataflow Architectures475
9.1 Latency-Hiding Techniques475
9.1.1 Shared Virtual Memory476
9.1.2 Prefetching Techniques480
9.1.3 Distributed Coherent Caches482
9.1.4 Scalable Coherence Interface483
9.1.5 Relaxed Memory Consistency486
9.2.1 Multithreading Issues and Solutions490
9.2 Principles of Multithreading490
9.2.2 Multiple-Context Processors495
9.2.3 Multidimensional Architectures499
9.3 Fine-Grain Multicomputers504
9.3.1 Fine-Grain Parallelism505
9.3.2 The MIT J-Machine506
9.3.3 The Caltech Mosaic C514
9.4.1 The Stanford Dash Multiprocessor516
9.4 Scalable and Multithreaded Architectures516
9.4.2 The Kendall Square Research KSR-1521
9.4.3 The Tera Multiprocessor System524
9.5 Dataflow and Hybrid Architectures531
9.5.1 The Evolution of Dataflow Computers531
9.5.2 The ETL/EM-4 in Japan534
9.5.3 The MIT/Motorola*T Prototype536
9.6 Bibliographic Notes and Exercises539
PART IV SOFTWARE FOR PARALLEL PROGRAMMING545
Chapter 10 Parallel Models, Languages,and Compilers547
10.1 Parallel Programiming Models547
10.1.1 Shared-Variable Model547
10.1.2 Message-Passing Model551
10.1.3 Data-Parallel Model554
10.1.4 Object-Oriented Model556
10.1.5 Functional and Logic Models559
10.2.1 Language Features for Parallelism560
10.2 Parallel Languages and Compilers560
10.2.2 Parallel Language Constructs562
10.2.3 Optimizing Compilers for Parallelism564
10.3 Dependence Analysis of Data Arrays567
10.3.1 Iteration Space and Dependence Analysis567
10.3.2 Subscript Separability and Partitioning570
10.3.3 Categorized Dependence Tests573
10.4 Code Optimization and Scheduling578
10.4.1 Scalar Optimization with Basic Blocks578
10.4.2 Local and Global Optimizations581
10.4.3 Vectorization and Parallelization Methods585
10.4.4 Code Generation and Scheduling592
10.4.5 Trace Scheduling Compilation596
10.5 LooP Parallelization and Pipelining599
10.5.1 Loop Transformation Theory599
10.5.2 Parallelization and Wavefronting602
10.5.3 Tiling and Localization605
10.5.4 Software Pipelining610
10.6 Bibliographic Notes and Exercises612
Chapter 11 Parallel Program Development and Environments617
11.1 Parallel Programming Environments617
11.1.1 Software Tools and Environments617
11.1.2 Y-MP,Paragon,and CM-5 Environments621
11.1.3 Visualization and Performance Tuning623
11.2 Synchronization and Multiprocessing Modes625
11.2.1 Principles of Synchrnization625
11.2.2 Multiprocessor Execution Modes628
11.2.3 Multitasking on Cray Multiprocessors629
11.3 Shared-Variable Program Structures634
11.3.1 Locks for Protected Access634
11.3.2 Semaphores and Applications637
11.3.3 Monitors and Applications640
11.4.1 Distributing the Computation644
11.4 Message-Passing Program Development644
11.4.2 Synchronous Message Passing645
11.4.3 Asynchronous Message Passing647
11.5 Mapping Programs onto Multicomputers648
11.5.1 Domain Decomposition Techniques648
11.5.2 Control Decomposition Techniques652
11.5.3 Heterogeneous Processing656
11.6 Bibliographic Notes and Exercises661
12.1 Multiprocessor UNIX Design Goals667
Chapter 12 UNIX,Mach,and OSF/1 for Parallel Computers667
12.1.1 Conventional UNIX Limitations668
12.1.2 Compatibility and Portability670
12.1.3 Address Space and Load Balancing671
12.1.4 Parallel I/O and Network Services671
12.2 Master-Slave and Multithreaded UNIX672
12.2.1 Master-Slave Kernels672
12.2.2 Floating-Executive Kernels674
12.2.3 Multithreaded UNIX Kernel678
12.3 Multicomputer UNIX Extensions683
12.3.1 Message-Passing OS Models683
12.3.2 Cosmic Environment and Reactive Kernel683
12.3.3 Intel NX/2 Kernel and Extensions685
12.4 Mach/OS Kernel Architecture686
12.4.1 Mach/OS Kernel Functions687
12.4.2 Multithreaded Multitasking688
12.4.3 Message-Based Communications694
12.4.4 Virtual Memory Management697
12.5 OSF/1 Architecture and Applications701
12.5.1 The OSF/1 Architecture702
12.5.2 The OSF/1 Programming Environment707
12.5.3 Improving Performance with Threads709
12.6 Bibliographic Notes and Exercises712
Bibliography717
Index739
Answers to Selected Problems765