Impact of CUDA and OpenCL on parallel and distributed computing

No Thumbnail Available
Issue Date
Asaduzzaman, Abu
Trent, Alec
Osborne, S.
Aldershof, C.
Sibai, Fadi N.

Asaduzzaman, A., Trent, A., Osborne, S., Aldershof, C., & Sibai, F. N. (2021). Impact of CUDA and OpenCL on parallel and distributed computing. Paper presented at the 2021 8th International Conference on Electrical and Electronics Engineering, ICEEE 2021, 238-242. doi:10.1109/ICEEE52452.2021.9415927 Retrieved from


Along with high performance computer systems, the Application Programming Interface (API) used is crucial to develop efficient solutions for modern parallel and distributed computing. Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL) are two popular APIs that allow General Purpose Graphics Processing Unit (GPGPU, GPU for short) to accelerate processing in applications where they are supported. This paper presents a comparative study of OpenCL and CUDA and their impact on parallel and distributed computing. Mandelbrot set (represents complex numbers) generation, Marching Squares algorithm (represents embarrassingly parallelism), and Bitonic Sorting algorithm (represents distributed computing) are implemented using OpenCL (version 2.x) and CUDA (version 9.x) and run on a Linux-based High Performance Computing (HPC) system. The HPC system uses an Intel i7-9700k processor and an Nvidia GTX 1070 GPU card. Experimental results from 25 different tests using the Mandelbrot Set generation, the Marching Squares algorithm, and the Bitonic Sorting algorithm are analyzed. According to the experimental results, CUDA performs better than OpenCL (up to 7.34x speedup). However, in most cases, OpenCL performs at an acceptable rate (CUDA speedup is less than 2x).

Table of Content
Click on the DOI link to access this conference paper at the publishers website (may not be free).