Step into Parallel computing

6 min readOct 22, 2021

This article is written to emphasize the importance of parallel computing. I will also discuss some approaches to perform parallel computing depending on the use case. Use this as a starting point and explore more in the world of parallel computing paradigms including GPGPU Programming using CUDA.

Why do programmers need parallel computers?

Traditionally computer programs are written to perform serial computations. That is the algorithms can be taken as a set of sequential instructions, which will execute one after another in the CPU. When we go through the history of computers, as time grows the performance of computers increased. So programmers had no requirement to go out of serial computing because whatever they want to compute was facilitated by the better underlying hardware infrastructure which was improved drastically.

Feel like we don't need parallel computing ??

Let’s see, as computer engineers or scientists we all know the “Moores's law” is associated with the complexity of computer chips ought to double roughly every two years. If you are more into electronics engineering you can take it as the number of transistors doubles and at the same time, the cost will be halved every year.

But this has limitations associated with it and will not be continued forever. The approach of scaling the frequency of processors hits the power wall due to power consumption and heat generation. There came the multi-core processors as hardware solutions. You can read how multi-core processors are associated with the power equation if you are more into electronics engineering and how capacitance and frequency act here.

Also, with the rapid development of the technology, there are use cases like

Scientific computations
Numerically intensive simulations
Database operations and information systems
AI /ML
Deep learning
Real-time systems and control applications

where we cannot rely solely on the development of underlying architecture due to the intensive computation requirements.

The formula is simply as time grows,

Number of resources needed at a time ≥ Number of available hardware at a time

The above reasons imply the requirement of multi-core processors. Now imagine a scenario having 100 resources available within the processor and if the running program only needs 10 resources? Simply 90 resources are in an idle state. This should be utilized right? So parallelism should work even if we have more resources than resources needed by a program. So to utilize the available resources software has to adopt for parallel computing.

These reasons emphasize the importance of parallel computing which might become a must for all types of programs. Now you can figure out the need for parallel computing, and let’s dive in.

What is parallel computing?

The simple answer is doing things simultaneously. Traditionally, in serial computing, a problem is divided into a set of instructions and performed sequentially. In contrast parallel computing the problem is divided into parts and can be executed simultaneously in different processors.

Four main types of parallel computing are,

Bit-level parallelism
Instruction level parallelism
Task level parallelism
Super word-level parallelism

Parallelism is a subset of concurrent execution.

Concurrency: A condition of a system in which multiple tasks are logically active at one time.
Parallelism: A condition of a system in which multiple tasks are actually active at one time.

The below diagram shows the difference clearly.

Parallel architectures and parallel models

Now you know about what is parallel computing. Let’s see how it is performed. There are many programming languages, APIs, libraries available to perform parallel computing. As engineers before we begin with a particular platform, we will look into the architecture and models for parallel computing which is classified according to the level at which hardware supports parallelism. The following summarises the programming models available for different parallel architectures.

Figure 03: Parallel architectures and programming models

Figure 03 explains the parallel architectures and corresponding programming models. There are multiple programming languages, APIs, libraries available for the above programming models. Among them, OpenMP and Open MPI are popular implementations. OpenMP is used for shared memory and Open MPI is used for message passing programming models respectively. Also, we can have a “hybrid architecture” that uses a combination of both the above programming models to facilitate the combination of shared memory architecture together with distributed memory architecture.

Demonstration

For the scope of this article, I will explain some OpenMP programming with a shared memory model.OpenMP provides a set of compiler directives and library routines for parallel application programmers. It is basically an API to enable multi-threads within your program. Using it simplifies writing multi-threaded programs in Fortran, C, and C++.

What I will do

Explain a computationally extensive algorithm with the mathematics behind it
Write the serial C code for the computation
Make the code parallel using OpenMP
Compare the results

1. Computationally extensive algorithm with the mathematics behind it

We will write a simple program in C language to calculate the value of PI(π). We can approximate the value of PI as a Reimann sum. Here we can calculate the value of PI using the following definite integral.

We can approximate it to a sum of areas of rectangles using the Reimaan sum as follows. Where each rectangle has width x and height F(x(i) ) at the middle of interval i.

Figure 04: Approximate value of PI using sum of areas of rectangles

So this can be expressed as an equation as follows.

Now we can set this limit into a sum of a series easily. The above is a bit of Numerical integration, where you might not be familiar with. The key concept you should understand here is we can approximate the value of PI using the sum of a series that goes for infinity. But practically what we do is we calculate the sum for n to be a very large number. So you can clearly understand that processing this calculation for a very large number is computationally intensive right?

2. Write the serial C code

First, I will put this equation into a simple C code.

Here I have approximated the above value iterating 1000000 times. This is the serial implementation.The time for the computation will be printed at the end. Since the omp_get_wtime() have been used you will have to use gcc -fopenmp serial_pi.c to compile this.

3. Make the code parallel using OpenMP

Now I have a computationally intensive program and I will parallelize the computationally intensive part of the code using OpenMP. Then I will change the number of threads used and I will measure the total time for execution.

Here in the above code, we have added OpenMP directives to parallelize the for loop which iterates 1000 000 times. But I have created a parallel region using directive

#pragma omp parallel

And the directive

 #pragma omp single

lets only one thread enter the section at a time. Also since each result of the separate parallel tasks needed to be summed, to avoid race conditions,

#pragma omp for reduction

directive is used with the sum variable.

The results of the code are as follows.

From figure 06 you can see that we have gained drastic performance for our calculation by introducing parallelization. Also increasing the number of threads used has even further reduced the execution time.

Conclusion

This article was aimed to introduce the importance of parallel computing. After discussing the architectures, an example was shown using the OpenMP. This can be used as a starting point and explore more in the world of parallel computing paradigms including GPGPU Programming using CUDA which is even more interesting.

Be :Different ! Happy coding !