Performance Analysis of the Incomplete Cholesky Preconditioned Conjugate Gradient Method on NVIDIA Graphics Processing Units with MATLAB

Yong, Vivian; Yong, Vivian

Thesis

Performance Analysis of the Incomplete Cholesky Preconditioned Conjugate Gradient Method on NVIDIA Graphics Processing Units with MATLAB

Public Deposited

Analytics

Download PDF

Creator

Yong, Vivian

Contributors

Dennis Giannacopoulos (Supervisor)

Abstract

French

À une époque où la résolution de systèmes linéaires complexes est une tâche courante dans divers domaines, le besoin d’efficacité informatique reste primordial. Cette thèse cherche à combler le fossé entre les algorithmes mathématiques complexes et l'accessibilité pour les ingénieurs, les chercheurs, les scientifiques et les passionnés.À la base, cette recherche explore les synergies entre deux technologies informatiques contemporaines: la méthode Incomplete Cholesky Preconditioned Conjugate Gradient (ICPCG) et les unités de traitement graphique (en anglais, Graphics Processing Units, ou GPUs) modernes, avec un accent particulier sur les puces graphiques mobiles NVIDIA. La méthode ICPCG est réputée pour son efficacité dans le traitement de grands systèmes clairsemés d'équations linéaires. Cependant, plutôt que de plonger dans les subtilités de l'architecture GPU avec l'utilisation d'une interface de programmation d'application (en anglais, Application Programming Interface, ou API), telle que Compute Unified Device Architecture (CUDA), nous examinons une programmation de niveau supérieur qui constitue une voie plus conviviale.La méthode ICPCG est implémentée dans l'environnement MATLAB et utilise Parallel Computing Toolbox (PCT) pour paralléliser la méthode sur les GPU NVIDIA modernes. Avec l’utilisation de PCT, au lieu de CUDA, il supprime la formidable barrière consistant à exiger une compréhension approfondie du matériel GPU, souvent un obstacle de taille pour les non-initiés. En démocratisant la parallélisation des GPU, nous permettons à des individus d'horizons divers d'exploiter les remarquables capacités de calcul des GPU modernes sans être gênés par les complexités de la programmation CUDA.Les chapitres expliquent la méthode ICPCG, présentent les avantages du GPU par rapport aux unités centrales de traitement (en anglais, Central Processing Unit, ou CPU) et présentent l'accessibilité du MATLAB PCT. Une méthodologie détaillée pour implémenter ICPCG sur les GPU NVIDIA est fournie et les résultats expérimentaux sont présentés de manière compréhensible. Des discussions et des conclusions approfondies font ressortir l’importance de cette approche dans le domaine du calcul scientifique.Alors que nous naviguons entre la sophistication mathématique et l’accessibilité, cette recherche ouvre la voie aux individus pour exploiter efficacement la parallélisation GPU, transcendant les limites des calculs traditionnels basés sur CPU. Ce faisant, il permet à un large éventail d’utilisateurs d’exploiter le potentiel extraordinaire du calcul accéléré par GPU sans avoir besoin d’une compréhension avancée des subtilités du matériel GPU, démocratisant ainsi le calcul scientifique haute performance. Nos résultats ont montré les avantages de la parallélisation de l'algorithme sur les GPU mobiles NVIDIA, en particulier pour les types de données simple précision, tout en reconnaissant les limites dans le cas des types de données double précision

English

In an era where solving intricate linear systems is a commonplace task across various domains, the need for computational efficiency remains paramount. This thesis seeks to bridge the gap between complex mathematical algorithms and accessibility for engineers, researchers, scientists, and enthusiasts alike.At its core, this research delves into the synergies between two contemporary computational technologies: the Incomplete Cholesky Preconditioned Conjugate Gradient (ICPCG) method and modern Graphics Processing Units (GPUs), with a particular focus on NVIDIA mobile graphics chips. The ICPCG method is renowned for its effectiveness in tackling large sparse systems of linear equations. However, rather than diving into the intricacies of GPU architecture with the use of an Application Programming Interface (API), such as Compute Unified Device Architecture (CUDA), we look at higher-level programming that is a more user-friendly avenue.The ICPCG method is implemented in the MATLAB environment and utilizes the Parallel Computing Toolbox (PCT) to parallelize the method on modern NVIDIA mobile GPUs. With the use of PCT, instead of CUDA, it removes the formidable barrier of requiring an in-depth understanding of GPU hardware, often a daunting obstacle for the uninitiated. By democratizing GPU parallelization, we empower individuals from various backgrounds to harness the remarkable computational capabilities of modern GPUs without being burdened by the complexities of CUDA programming. Chapters elucidate the ICPCG method, introduce GPU advantages over Central Processing Units (CPUs), and showcase MATLAB PCT’s accessibility. A detailed methodology for implementing ICPCG on NVIDIA GPUs is provided, and the experimental results are presented in a comprehensible manner. In-depth discussions and conclusions bring forth the significance of this approach in the realm of scientific computing.As we navigate the nexus of mathematical sophistication and accessibility, this research illuminates a path for individuals to leverage GPU parallelization effectively, transcending the boundaries of traditional CPU-based computations. In doing so, it empowers a diverse spectrum of users to tap into the extraordinary potential of GPU-accelerated computing without the need for an advanced understanding of GPU hardware intricacies, ultimately democratizing high-performance scientific computing. Our results have showcased the benefits of parallelizing the algorithm on NVIDIA mobile GPUs, particularly for single-precision data types, while acknowledging limitations in the case of double-precision data types

Subject