Syllabus High Performance Computing - [410250] Credit Examination Scheme : 3 In-Sem (TH) : 30 End-Sem (TH) : 70 Unit I Introduction to Parallel Computing Introduction to Parallel Computing : Motivating Parallelism, Modern Processor : Stored- program computer architecture, General-purpose Cache-based Microprocessor architecture. Parallel Programming Platforms : Implicit Parallelism, Dichotomy of Parallel Computing Platforms, Physical Organization of Parallel Platforms, Communication Costs in Parallel Machines. Levels of parallelism, Models : SIMD, MIMD, SIMT, SPMD, Data Flow Models, Demand-driven Computation, Architectures : N-wide superscalar architectures, multi-core, multi-threaded. (Chapter - 1) Unit II Parallel Algorithm Design Principles of Parallel Algorithm Design : Preliminaries, Decomposition Techniques, Characteristics of Tasks and Interactions, Mapping Techniques for Load Balancing, Methods for Containing Interaction Overheads, Parallel Algorithm Models : Data, Task, Work Pool and Master Slave Model, Complexities : Sequential and Parallel Computational Complexity, Anomalies in Parallel Algorithms. (Chapter - 2) Unit III Parallel Communication Basic Communication : One-to-All Broadcast, All-to-One Reduction, All-to-All Broadcast and Reduction, All-Reduce and Prefix-Sum Operations, Collective Communication using MPI : Scatter, Gather, Broadcast, Blocking and non blocking MPI, All-to-All Personalized Communication, Circular Shift, Improving the speed of some communication operations. (Chapter - 3) Unit IV Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs, Performance Measures and Analysis : Amdahl's and Gustafson's Laws, Speedup Factor and Efficiency, Cost and Utilization, Execution Rate and Redundancy, The Effect of Granularity on Performance, Scalability of Parallel Systems, Minimum Execution Time and Minimum Cost, Optimal Execution Time, Asymptotic Analysis of Parallel Programs. Matrix Computation : Matrix-Vector Multiplication, Matrix-Matrix Multiplication. (Chapter - 4) Unit V CUDA Architecture Introduction to GPU : Introduction to GPU Architecture overview, Introduction to CUDA C- CUDA programming model, write and launch a CUDA kernel, Handling Errors, CUDA memory model, Manage communication and synchronization, Parallel programming in CUDA-C. (Chapter - 5) Unit VI High Performance Computing Applications Scope of Parallel Computing, Parallel Search Algorithms : Depth First Search(DFS), Breadth First Search(BFS), Parallel Sorting : Bubble and Merge, Distributed Computing : Document classification, Frameworks - Kuberbets, GPU Applications, Parallel Computing for AI/ML (Chapter - 6)