图书介绍

高级计算机体系结构 英文PDF|Epub|txt|kindle电子书版本网盘下载

高级计算机体系结构 英文
  • (美)(黄铠)Kai Hwang著 著
  • 出版社: 北京:机械工业出版社
  • ISBN:7111067126
  • 出版时间:1999
  • 标注页数:770页
  • 文件大小:36MB
  • 文件页数:792页
  • 主题词:

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

高级计算机体系结构 英文PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

PART 1 THEORY OF PARALLELISM1

Chapter 1 Parallel Computer Models3

1.1 The State of Computing3

1.1.1 Computer Development Milestones3

1.1.2 Elements of Modern Computers6

1.1.3 Evolution of Computer Architecture9

1.1.4 System Attributes to Pertormance14

Foreword17

Preface19

1.2 Multiprocessors and Multicomputers19

1.2.1 Shared-Memory Multiprocessors19

1.2.2 Distrlbuted-Memory Multicomputers24

1.2.3 A Taxonomy of MIMD Computers27

1.3 Multivector and SIMD Computers27

1.3.1 Vector Supercomputers27

1.3.2 SIMD Supercomputers30

1.4 PRAM and VLSI Models32

1.4.1 Parallel Random-Access Machines33

1.4.2 VLSI Complexity Model38

1.5 Architectural Development Tracks41

1.5.1 Multiple-Processor Tracks41

1.5.2 Multivector and SIMD Tracks43

1.5.3 Multithreaded and Dataflow Tracks44

1.6 Bibliographic Notes and Exercises45

Chapter 2 Program and Network Properties51

2.1 Conditions of Parallelism51

2.1.1 Data and Resource Dependences51

2.1.2 Hardware and Software Parallelism57

2.1.3 The Role of Compilers60

2.2 Program Partitioning and Scheduling61

2.2.1 Grain Sizes and Latency61

2.2.2 Grain Packing and Scheduling64

2.2.3 Static Multiprocessor Scheduling67

2.3 Program Flow Mechanisms70

2.3.1 Control Flow Versus Data Flow71

2.3.2 Demand-Driven Mechanisms74

2.3.3 Comparison of flow Mechanisms75

2.4 System Interconnect Architectures76

2.4.1 Network Properties and Rorting77

2.4.2 Static Connection Networks80

2.4.3 Dynamic Connection Networks89

2.5 Bibliographic Notes and Exercises96

Chapter 3 Principles of Scalable Performance105

3.1 Performance Metrics and Measures105

3.1.1 Parallelism Profile in Programs105

3.1.2 Harmonic Mean Performance108

3.1.3 Efficiency,Utilization,and Quality112

3.1.4 Standard Performance Measures115

3.2.1 Massive Parallelism for Grand Challenges118

3.2 Parallel Processing Applications118

3.2.2 Application Models of Parallel Computers122

3.2.3 Scalability of Parallel Algorithms125

3.3 Speedup Performance Laws129

3.3.1 Amdahl s Law for a Fixed Workload129

3.3.2 Gustafson s Law for Scaled Problems131

3.3.3 Memory-Bounded Speedup Model134

3.4.1 Scalability Metrics and Goals138

3.4 Scalability Analysis and Approaches138

3.4.2 Evolution of Scalable Computers143

3.4.3 Research Issues and Solutions147

3.5 Bibliographic Notes and Exercises149

PART Ⅱ HARDWARE TECHNOLOGIES155

Chapter 4 Processors and Memory Hierarchy157

4.1 Advanced Processor Technology157

4.1.1 Design Space of Processors157

4.1.2 Instruction-Set Architectures162

4.1.3 CISC Scalar Processors165

4.1.4 RISC Scalar Processors169

4.2 Superscalar and Vector Processors177

4.2.1 Superscalar Processors178

4.2.2 The VLIW Architecture182

4.2.3 Vector and Symbolic Processors184

4.3 Memory Hierarchy Technology188

4.3.1 Hierarchical Memory Technology188

4.3.2 Inclusion,Coherence,and Locality190

4.3.3 Memory Capacity Planning194

4.4 Virtual Memory Technology196

4.4.1 Virtual Memory Models196

4.4.2 TLB,Paging,and Segmentation198

4.4.3 Memory Replacement Policies205

4.5 Bibliographic Notes and Exercises208

5.1 Backplane Bus Systems213

5.1.1 Backplane Bus Specification213

Chapter 5 Bus,Cache,and Shared Memory213

5.1.2 Addressing and timing Protocols216

5.1.3 Arbitration,Transaction,and Interrupt218

5.1.4 The IEEE Futurebus+Standards221

5.2 Cache Memory Organizations224

5.2.1 Cache Addressing Models225

5.2.2 Direct Mapping and Associative Caches228

5.2.3 Set-Associative and Sector Caches232

5.2.4 Cache Performance Issues236

5.3 Shared-Memory Organizations238

5.3.1 Interleaved Memory Organization239

5.3.2 Bandwidth and Fault Tolerance242

5.3.3 Memory Allocation Schemes244

5.4 Sequential and Weak Consistency Models248

5.4.1 Atomicity and Event Ordering248

5.4.2 Sequential Consistency Model252

5.4.3 Weak Consistency Models253

5.5 Bibliographic Notes and Exercises256

Chapter 6 Pipelining and Superscalar Techniques265

6.1 Linear Pipeline Processors265

6.1.1 Asynchronous and Synchronous Models265

6.1.2 Clocking and Timing Control267

6.1.3 Speedup,Efficiency,and Throughput268

6.2 Nonlinear Pipeline Processors270

6.2.1 Resservation and Latency Analysis270

6.2.2 Collision-Free Scheduling274

6.2.3 Pipeline Schedule Optimization276

6.3 Instruction Pipeline Design280

6.3.1 Instruction Execution Phases280

6.3.2 Mechanisms for Instruction Pipelining283

6.3.3 Dynamic Instruction Scheduling288

6.3.4 Branch Handling Techniques291

6.4 Arithmetic Pipeline Design297

6.4.1 Computer Arithmetic Principles297

6.4.2 Static Arithmetic Pipelines299

6.4.3 Multifunctional Arithmetic Pipelines307

6.5 Superscalar and Superpipeline Design308

6.5.1 Superscalar Pipeline Design310

6.5.2 Superpipelined Design316

6.5.3 Supersymmetry and Design Tradeoffs320

6.6 Bibliographic Notes and Exercises322

PART Ⅲ PARALLEL AND SCALABLE ARCHITECTURES329

Chapter 7 Multiprocessors and Multicomputers331

7.1 Multiprocessor System Interconnects331

7.1.1 Hierarchical Bus Systems333

7.1.2 Crossbar Switch and Multiport Memory336

7.1.3 Multistage and Combining Networks341

7.2 Cache Coherence and Synchronization Mechanisms348

7.2.1 The Cache Coherence Problem348

7.2.2 Snoopy Bus Protocols351

7.2.3 Directory-Based Protocls358

7.2.4 Hardware Synchronization Mechanisms364

7.3.1 Design Choices in the Past368

7.3 Three Generations of Multicomputers368

7.3.2 Present and Future Development370

7.3.3 The Intel Paragon System372

7.4 Message-passing Mechanisms375

7.4.1 Message-Routing Schemes375

7.4.2 Deadlock and Virtual Channels379

7.4.3 Flow Control Strategies383

7.4.4 Multicast Routing Algorithms387

7.5 Bibliographic Notes and Exercises393

Chapter 8 Multivector and SIMD Computers403

8.1 Vector Processing Principles403

8.1.1 Vector Instruction Types403

8.1.2 Vector-Access Memory Schemes408

8.1.3 Past and Present Supercomputers410

8.2.1 Performance-Directed Design Rules415

8.2 Multivector Multiprocessors415

8.2.2 Cray Y-MP,C-90,and MPP419

8.2.3 Fujitsu VP2000 and VPP500425

8.2.4 Mainframes and Minisupercomputers429

8.3 Compound Vector Processing435

8.3.1 Compound Vector Operations436

8.3.2 Vector Loops and Chaining437

8.3.3 Multipipeline Networking442

8.4 SIMD Computer Organizations447

8.4.1 Implementation Models447

8.4.2 The CM-2 Architecture449

8.4.3 The MasPar MP-1 Architecture453

8.5 The Connection Machine CM-5457

8.5.1 A Synchronized MIMD Machine457

8.5.2 The CM-5 Network Archiecture460

8.5.3 Control Processors and Processing Nodes462

8.5.4 Interprocessor Communications465

8.6 Bibliographic Notes and Exercises468

Chapter 9 Scalable,Multithreaded,and Dataflow Architectures475

9.1 Latency-Hiding Techniques475

9.1.1 Shared Virtual Memory476

9.1.2 Prefetching Techniques480

9.1.3 Distributed Coherent Caches482

9.1.4 Scalable Coherence Interface483

9.1.5 Relaxed Memory Consistency486

9.2.1 Multithreading Issues and Solutions490

9.2 Principles of Multithreading490

9.2.2 Multiple-Context Processors495

9.2.3 Multidimensional Architectures499

9.3 Fine-Grain Multicomputers504

9.3.1 Fine-Grain Parallelism505

9.3.2 The MIT J-Machine506

9.3.3 The Caltech Mosaic C514

9.4.1 The Stanford Dash Multiprocessor516

9.4 Scalable and Multithreaded Architectures516

9.4.2 The Kendall Square Research KSR-1521

9.4.3 The Tera Multiprocessor System524

9.5 Dataflow and Hybrid Architectures531

9.5.1 The Evolution of Dataflow Computers531

9.5.2 The ETL/EM-4 in Japan534

9.5.3 The MIT/Motorola*T Prototype536

9.6 Bibliographic Notes and Exercises539

PART IV SOFTWARE FOR PARALLEL PROGRAMMING545

Chapter 10 Parallel Models, Languages,and Compilers547

10.1 Parallel Programiming Models547

10.1.1 Shared-Variable Model547

10.1.2 Message-Passing Model551

10.1.3 Data-Parallel Model554

10.1.4 Object-Oriented Model556

10.1.5 Functional and Logic Models559

10.2.1 Language Features for Parallelism560

10.2 Parallel Languages and Compilers560

10.2.2 Parallel Language Constructs562

10.2.3 Optimizing Compilers for Parallelism564

10.3 Dependence Analysis of Data Arrays567

10.3.1 Iteration Space and Dependence Analysis567

10.3.2 Subscript Separability and Partitioning570

10.3.3 Categorized Dependence Tests573

10.4 Code Optimization and Scheduling578

10.4.1 Scalar Optimization with Basic Blocks578

10.4.2 Local and Global Optimizations581

10.4.3 Vectorization and Parallelization Methods585

10.4.4 Code Generation and Scheduling592

10.4.5 Trace Scheduling Compilation596

10.5 LooP Parallelization and Pipelining599

10.5.1 Loop Transformation Theory599

10.5.2 Parallelization and Wavefronting602

10.5.3 Tiling and Localization605

10.5.4 Software Pipelining610

10.6 Bibliographic Notes and Exercises612

Chapter 11 Parallel Program Development and Environments617

11.1 Parallel Programming Environments617

11.1.1 Software Tools and Environments617

11.1.2 Y-MP,Paragon,and CM-5 Environments621

11.1.3 Visualization and Performance Tuning623

11.2 Synchronization and Multiprocessing Modes625

11.2.1 Principles of Synchrnization625

11.2.2 Multiprocessor Execution Modes628

11.2.3 Multitasking on Cray Multiprocessors629

11.3 Shared-Variable Program Structures634

11.3.1 Locks for Protected Access634

11.3.2 Semaphores and Applications637

11.3.3 Monitors and Applications640

11.4.1 Distributing the Computation644

11.4 Message-Passing Program Development644

11.4.2 Synchronous Message Passing645

11.4.3 Asynchronous Message Passing647

11.5 Mapping Programs onto Multicomputers648

11.5.1 Domain Decomposition Techniques648

11.5.2 Control Decomposition Techniques652

11.5.3 Heterogeneous Processing656

11.6 Bibliographic Notes and Exercises661

12.1 Multiprocessor UNIX Design Goals667

Chapter 12 UNIX,Mach,and OSF/1 for Parallel Computers667

12.1.1 Conventional UNIX Limitations668

12.1.2 Compatibility and Portability670

12.1.3 Address Space and Load Balancing671

12.1.4 Parallel I/O and Network Services671

12.2 Master-Slave and Multithreaded UNIX672

12.2.1 Master-Slave Kernels672

12.2.2 Floating-Executive Kernels674

12.2.3 Multithreaded UNIX Kernel678

12.3 Multicomputer UNIX Extensions683

12.3.1 Message-Passing OS Models683

12.3.2 Cosmic Environment and Reactive Kernel683

12.3.3 Intel NX/2 Kernel and Extensions685

12.4 Mach/OS Kernel Architecture686

12.4.1 Mach/OS Kernel Functions687

12.4.2 Multithreaded Multitasking688

12.4.3 Message-Based Communications694

12.4.4 Virtual Memory Management697

12.5 OSF/1 Architecture and Applications701

12.5.1 The OSF/1 Architecture702

12.5.2 The OSF/1 Programming Environment707

12.5.3 Improving Performance with Threads709

12.6 Bibliographic Notes and Exercises712

Bibliography717

Index739

Answers to Selected Problems765

热门推荐