CIBC:SoftwareDesign:LinearAlgebra

From NCRR Biomedical Software Development, Engineering, and Dissemination Wiki

Jump to: navigation, search

Contents

Linear Algebra

Overview

The idea is to create a new computational environment not unlike MPI to do matrix operations in parallel using CBLAS underneath for optimal computation speeds. To do these calculations we create a new class that sets up a series of threads when called and does iterate through a series of mathematical expressions and then returns to the non parallel world.

The idea is to shield most of the parallel stuff for the user who just wants to implement a function and give him or her an environment in which we can do linear algebra without the need to really think about doing computations on multiple threads.

In a similar fashion like MPI we have a main algorithm through which all the threads execute. However here the parallel instructions are hidden in the overloaded matrix operations of an underlying class that knows how to do basic operations in parallel.

New matrix class

The current class for matrices called DenseMatrix and ColumnMatrix are far from optimized and contain all kinds of leftovers of previous developments and do not contain mechanisms to do math in parallel in an optimal way. Hence the idea is to introduce a new class:

class DMatrix {
 public:
   DMatrix(MatrixHandle& mat);
 ...
}

This class will internally keep a handle to the Matrix so it can reuse its memory management system. The name DMatrix stands for 'Matrix of Doubles', I envison at some point we want to have a 'CDMatrix' as well for 'Matrix of complex doubles' etc.

This class however will have the option to attach a parallel environment specifying the number of threads that are available for computation and hence will know about parallel computing.

By specifying a new matrix class, we can restrict the class to having only two internal representations: sparse or dense, and not the four we have currently in SCIRun. This will reduce the amount of additions, multiplies etc, we need to write. For two representation there are only four combinations per operator, for the four matrix types we have now this expands to sixteen, which requires far more implementation.


New parallel environment

To do math I envison we use a class that basically streamlines parallel computation. As launching threads is slow (depending on OS, but for example on OSX it is extremely slow) it would be nice to setup threads first and then do a few matrix operations in series in the same environment. This will speed up computation.

What I am envisioning is a construct like what is used in MPI, where one writes a segment of code that is executed in parallel and the code that is being executed is the same on all threads. In this case MPI calls are replaced with the matrix operations. As DMatrix is aware of the environment it will execute an addition, multiply or solve in parallel and make it appear that every thread did the same operation.

This is how I see the environment would look like

class ParallelLinearAlgebra {
 public:
   // Constructor

   // Setup a series of threads to do parallel math
   ParallelLinearAlgebra(unsigned int numprocesses);
   // Automatically detect the number of cores we can use for parallel math
   ParallelLinearAlgebra(); 

   // Function that needs to be overloaded with the actual parallel code
   virtual void function(vector<DMatrix> inputs, vector<DMatrix>& outputs);
    
   // Function that needs to be called from the normal single execution world     
   void execute(vector<MatrixHandle> inputs, vector<MatrixHandle>& outputs);      
};


To use this class one should overload the function with the parallel code:


class MyParallelCode public: ParallelLinearAlgebra {
  public:
    virtual void function(vector<DMatrix> inputs, vector<DMatrix>& outputs);
};


An example of parallel code could be like this:

void
MyParallelCode::function(vector<DMatrix> inputs, vector<DMatrix>& outputs)
{
  DMatrix A, B, C;
  A = inputs[0];
  B = inputs[1];

  C= A*B + 10*B;

  outputs[0] = C;
}

One could execute a piece of parallel code as following:


// My parallel module
MyParallelCode mycode;

vector<MatrixHandle> inputs(2);
vector<MatrixHandle> outputs(1);
input[0] = matrix1;
input[1] = matrix2;
mycode.execute(inputs,outputs);

MatrixHandle output = outputs[0];

Memory management

For the DMatrix class I would use a similar memory management method as used in the matlab classes. Here the object is not directly a handle but not directly an object neither. It is an extended handle that knows the basic properties of the matrix, but can share data with other DMatrices (the datablock idea). This way we can have the algebra in the functions look like real algebra and type like real algebra and only copies are made when it is really necessary. Hence the DMatrix will know whether the data it has is shared or whether it can write the data. If it is shared it will clone when the matrix is altered.

The current SCIRun concept of Handles does not support this directly hence it will be added into this new parallel object.

Local variables

One can define local variables in the parallel construct, however one has to be aware that the code is running in parallel and that operations on local doubles are done for each thread.

Accessing SCIRun from parallel world

This requires some more thought as code runs in parallel, one cannot access SCIRun elements that run in the single threaded world directly as one would access it n times. Hence the environment will have some clause to see the rank of the process. One could add something like:

if (get_rank() == 0)
{
  // Access SCIRun object
  ...
} 

And I assume we should have a basic send and receive setup so one can synchronize variables between threads. Although for most basic linear algebra we would probably not need this.

template<class T>
sync_to_all(T& var, int from_rank);


Functions to support

Here is a lsit functions we might want to support in this environment:

Matrix types

  • 2D dense matrix (a 1x1 matrix can be a scalar)
  • 2D sparse matrix

Functions

Matrix operations:

  • Addition
  • Substraction
  • Mulitplication
  • Inversion
  • Transpose

Matrix to Scalar:

  • Norm
  • Abs
  • Maximum
  • Minimum
  • Mean

Solvers:

  • ConjGrad
  • BiConjGrad
  • GMRES

Operation per matrix element:

  • sin, cos, tan, ctg
  • asin, acos, atan, actg
  • ceil, floor, round
  • log, ln, log10, exp


Modules to rewrite

In the core we should have modules like LinearAlgebra LinAlgBinary and LinAlgUnary have use this mechanism internally. Similarly we can have modules like Tikhonov use this mechanism.

Personal tools