OpenMP - Introduction of Parallelism

PThreads are based on OS features. It has high overhead for thread creation. Also, it is quite low level, which means more and harder-found data race bugs. Also, deadlocks are usually easier.

Introduction

As almost all machines today are using multi-core CPU, we need simpler, "lighter" syntax way to write multithreading code. An alternative is OpenMP.

Using PThreads with

#include <omp.h>

compile (using gcc) with

gcc -fopenmp *.c

Basic directives

OpenMP uses "hints" or compiler "directives" as to what intended to parallelize

#pragma omp directivename [clause list]

Library functions

Also, library functions included in omp.h for providing information about currently running program

/* used within parallel */
int omp_get_num_threads(); // return #threads running to the closest block
int omp_get_thread_num(); // get thread id
int omp_in_parallel(); // return non-zero if within a parallel region

/* used anywhere */
int omp_get_max_threads() // max #threads can be created;
int omp_get_num_procs(); // #processors available 
void omp_set_num_threads(int n); // set #threads for the next parallel section

Basic Example

int main() {
    omp_set_num_threads(8);
    #pragma omp parallel {
        omp_get_thread_num(); // 1 ... 8, no guarantees on order,
        omp_get_num_threads(); // 8;
    }
    omp_get_num_threads(); // 1
    return 0;
}

Directive Clauses

// only parallel if expr holds
#pragma omp parallel if(expr) 

// #threads, overrides omp_set_num_threads or env variable OMP_NUM_THREADS
#pragma omp parallel num_threads(8)

Nested parallelism

OpenMP supported arbitrarily deep nesting of omp parallel, whenever there are enough threads. However, omp_get_num_threads, omp_get_thread_num will only get number of threads within its group/parallel region.

#pragma omp parallel num_threads(2) {
    omp_get_num_threads(); // 2
    #pragma omp parallel num_threads(2) {
        // there are 4 threads/cores running
        omp_get_num_threads(); // 2
    }
    omp_get_num_threads(); // 2
}

Variable Semantics

-private: each thread get a local copy -shared: all threads

OpenMP - Introduction of Parallelism

Introduction

Basic directives

Library functions

Basic Example

Directive Clauses

Nested parallelism

Variable Semantics

`default` State

Reduction

Example: dot product

OpenMP - Introduction of Parallelism

Introduction

Basic directives

Library functions

Basic Example

Directive Clauses

Nested parallelism

Variable Semantics

Data sharing / scope of variables

default State

Reduction

Example: dot product

`default` State