Subscribe by Email

Your email:

Cilk++ for Linux is Here!

(release overview)


Multicore Programming Blog

Current Articles | RSS Feed RSS Feed

Multicore-enabling Summation of Matrix Column Maximums (and Avoiding Data Races)

Posted by John Hart on Thu, Sep 18, 2008
 | Digg digg it | Reddit reddit | del.icio.us del.icio.us | StumbleUpon StumbleUpon 

This example (code below) is included in the Cilk++ install package, and illustrates several aspects of multicore-enabling an algorithm with Cilk++. (For more on Cilk++ features and syntax in general, and dealing with non-local variables in particular, check out the documentation.)

matrix sum example1. This example Cilkifies a simple C++ program.

  • To learn how to use a reducer to eliminate a race condition.
  • To demonstrate the symptoms and impact of a race condition.
  • To use Cilkscreen to identify a race condition.
  • To confirm that a Cilkified version with 1 worker has low overhead (i.e.,  relative to the serialization).
  • To investigate the performance impact of alternative data arrangements.
  • To show the effect when using a reducer and floating point addition, which is not strictly associative. You can get different results from one run to the next, and the user needs to decide if this is acceptable (it often is).

2. The Program:

  • Calls a "column_max()" function to find the maximum value in a column of a square matrix.
  • Computes and displays the sum of the column maximums.

The matrix size is exactly as in the matrix-transpose example. The matrix is also populated the same way.

3. Steps

  • Assumptions:
    • You've installed Cilk++ and are running MS Visual Studio 2005, OR you have installed the Cilk++ GCC version on Linux
    • You're host system has at least two cores
  • Follow the system-specific instructions to build and run the sum-cilk.cilk code
  • Windows Only: Open the Windows Task Manager "Performance" tab so that you can see CPU usage. NOTE: Performance results will be more meaningful if your system is relatively quiet. You may need to shut down other processes, such as open Word documents, background operations, and so on.
  • The command line syntax to run sum-cilk is:
  • Windows:
    • sum-cilk [n] [-cilk_set_worker_count=k]
  • GCC:
    • CILK_NPROC=k sum-cilk [n]
  • k: Number of worker threads
  • n: Number of rows and columns in the matrix
  • Here is sample output:
./sum-cilk 5000
We'll sum the column maximums of an 5000 by 5000 matrix.
Initializing the matrix required 0.125 seconds.
The sum of the 5000 maximum values is: 4428.71
The sum of maximums required 0.405 seconds.

NOTE: The matrix is populated randomly with values in the range [-1.0, +1.0], so the sum must be <= 5000. Therefore, this value is reasonable.

4. Additional Experiments

  1. Experiment with different sizes, such as 10, 100, 1000, 10000, ... Observe the time requirements and the CPU utilizations. You may have memory limitation and paging file limitations with 10000.
  2. Also notice that the results may not be the same for every run, especially with large n values, such as 10000. This is because floating point arithmetic is not associative. The variation in results may or may not be acceptable; this would be a judgement based upon requirements.
  3. Convert the outer loop (the call to column_max()) to a cilk_for
  4. Experiment with the number of workers and the cilk_for granularity. Are the results correct? Are they repeatable? If not, why not?
  5. Run Cilkscreen. Examine and explain the results.
  6. Note the summing reducer (cilk::hyper_ptr<cilk::reducer_opadd<float> > sum_max;) to eliminate the race condtion. What happens if you remove the reducer and sum without it? Test again and run Cilkscreen again.
  7. What is the effect of large matrices? >= 7500 x 7500
  8. Are the results repeatable?
  9. Look carefully at the column_max() function. Is the matrix arranged properly for performance? What would happen if we transposed the matrix first, and rewrote the function to process row wise? Try it. (Hint - a comparison of two different arrangements are illustrated on the right.)
  10. Instead of a reducer, try a lock. Compute the column max first, lock, increment the sum, and unlock. What is the performance? (try both data arrangements).

The Code

/*

 * sum-cilk.cilk

 *

 * Copyright (c) 2008 Cilk Arts, Inc.  All rights reserved.

 *

 * Sum the maximum column values in a large square matrix.

 */

 

#include <cilk.h>

#include "../include/cilk_time.h"

#include <iostream>

#include <math.h>

#include <reducer_opadd.h>

#include <memory.h>

 

#define DEFAULT_MATRIX_SIZE 100


 

// Undefine the Windows max macro definition

#undef max

// Find the maximum value in column j of n x n matrix, A.

template <typename T>

T column_max (T * A, int n, int j)

{

    T cmax = -HUGE_VAL; //-1.7D-308;

    for (int i=1; i<n; ++i) {

        cmax = std::max(cmax, A[i*n+j]);

    }

    return cmax;

}

 

int cilk_main (int argc, char ** argv) {

    // Create random input matrices. Override the default size with argv[1]

    int nn = DEFAULT_MATRIX_SIZE;

    if (argc > 1 && argv[1][0] != '-') {

        nn = std::atoi(argv[1]);

    }

 

    std::cout << "We'll sum the column maximums of an " << nn << " by "

              << nn << " matrix." << std::endl;

 

    float *A = (float*)calloc(nn * nn, sizeof(float));

    if (NULL == A) {

        std::cout << "Fatal Error. Cannot allocate matrix A." << std::endl;

        return 1;

    }

 

    memset(A, 0, nn*nn*sizeof(float));

    int start = cilk_get_time();

    // Populate A

    cilk_for (int i = 0; i < nn*nn; ++i) {

        A[i] = (float)((i*i) % 1024 - 512) / 512;

    }

    int end = cilk_get_time();    // OS independent, ms counter. Defined in cilk_time.h

    int init_time = (end - start);

    std::cout << "Initializing the matrix required " << init_time/1000 << "."

              << init_time%1000 << " seconds." << std::endl;

 

    // Sum the column maximums

    start = cilk_get_time();

 

    cilk::hyper_ptr<cilk::reducer_opadd<float> > sum_max;

    cilk_for (int j = 0; j < nn; ++j)   // Only Cilk++ keyword in the program

        *sum_max += column_max (A, nn, j);

 

    end = cilk_get_time();

    int par_time = (end - start);

    std::cout << "The sum of the " << nn << " maximum values is: "

              << sum_max->get_value() << std::endl;

    std::cout << "The sum of maximums required " << par_time/1000

              << "." << par_time%1000 << " seconds." << std::endl;

 

    free(A);

    return 0;

}