Home > Programming, Technology > Aparapi Java Matrix Multiplication Example

Aparapi Java Matrix Multiplication Example



import java.util.Random;
import com.amd.aparapi.Kernel;

/**
 * @author Vasanth Raja Chittampally
 */

public class AparapiMatrixMultiplication  {
	public static void main(String [] args) throws Exception
	{

		final int r = 1024;
		final int c1 = r;
		final int c2 = r;
		AparapiMatMul ap = new AparapiMatMul(r, c1, c2);

		try {
		long  time1 = System.currentTimeMillis();
                //ap.setExecutionMode(Kernel.EXECUTION_MODE.JTP);
                //ap.setExecutionMode(Kernel.EXECUTION_MODE.GPU);
                //ap.setExecutionMode(Kernel.EXECUTION_MODE.CPU);
		ap.execute(r,c2);
		System.out.println("Time taken for kenel execution in "+ ap.getExecutionMode()+ " mode is :"+ (System.currentTimeMillis() - time1));
		}catch(NullPointerException ne){
			ne.printStackTrace();
		}
		//ap.printResults();
		long time1 = System.currentTimeMillis();
		ap.normalMatMulCalc();
		System.out.println("Time taken for kenel execution in Sequential CPU mode is :"+ (System.currentTimeMillis() - time1));
		ap.compareResults();
		ap.dispose();
	}
}

class AparapiMatMul extends Kernel {

	float matA[];
	float matB[];
	float matC[];
	float C[];

	int rows ;
	int cols1;
	int cols2;

	@Override
	public void run() {
		int i = getGlobalId();
		int j = getPassId();
		float value = 0;
		for(int k = 0; k < cols1; k++)
		{
			value += matA[k + i * cols1] * matB[k * cols2 + j];
		}
		matC[i * cols1 + j] = value;
	}

	public AparapiMatMul(int r, int c1, int c2)
	{

		rows = r;
		cols1 = c1;
		cols2 = c2;

		matA = new float [r * c1];
		matB = new float [c1 * c2];
		matC = new float [r * c2];
		C = new float[r * c2];
		//matC should be initialized with zeros
		for(int i = 0; i < r; i++ )
		{
			for(int j = 0 ; j < c1; j++ )
			{
				matC[i * c1 + j ] = 0;

			}
		}

		//Here matrix A is initialized with random numbers

		for(int i = 0; i < r; i++ )
		{
			for(int j = 0 ; j < c1; j++ )
			{
				matA[i * c1 +j] = new Random().nextFloat();
			}
		}

		// Here matrix B is initialized with random numbers

		for(int i = 0; i < r; i++ )
		{
			for(int j = 0 ; j < c1; j++ )
			{
				matB[i * c2 + j] = new Random().nextFloat();
			}
		}

	}

	public void printResults()
	{
		for(int i = 0; i < rows; i++ )
		{
			for(int j = 0 ; j < cols2; j++ )
			{
				System.out.print(matC[i * cols2 + j]+"    ");
			}
		}
	}

	public void normalMatMulCalc()
	{
		System.out.println();
		System.out.println("Sequential Execution on CPU");
		 for(int i = 0;i < rows; i++)
			{
				for(int j = 0; j < cols2; j++)
				{
					float sum = 0;
					for(int k = 0; k < cols1; k++)
					{
						sum += matA[i*cols1+k] * matB[k*rows+j];
					}
				    C[i * cols2 + j] = sum;
				}

			}
	}
	public void compareResults()
	{
		boolean equal = true;
		for(int i = 0; i < rows * cols2 ; i++)
		{
			if(matC[i] != C[i])
			{
				equal = false;
				break;
			}
		}
		if(!equal)
			System.out.println("Results are not equal");
		else
			System.out.println("Results are equal.. Tested thoroughly!!!");
	}

}

Above code simply performs the matrix multiplication operation. The overloaded run method is the Kernel code which runs
on the GPU or JTP or CPU. First the above code is converted into Bytecode, this byte code is again converted to OpenCL
code.

You can compare ease of writing above code with  OpenCL C code here but  we need to compromise on some optimizations.

Results are as follows:
Output1:

Time taken for kenel execution in GPU mode is :8791
Sequential Execution on CPU
Time taken for kenel execution in Sequential CPU mode is :11580
Results are equal.. Tested thoroughly!!!

Output 2:

Time taken for kenel execution in JTP mode is :7765
Sequential Execution on CPU
Time taken for kenel execution in Sequential CPU mode is :12491
Results are equal.. Tested thoroughly!!!

 

Thanks to Gary Frost for your inputs for this program. Here I’m posting the changes I made to the above program. The ap.execute(r,c2) function calls the kernel c2 times which is not the same as clEnqueueNDRange() function. The corrected code as follows.

 


import java.util.Random;
import com.amd.aparapi.Kernel;

/**
 * @author Vasanth Raja Chittampally
 */

public class AparapiMatrixMultiplication  {
	public static void main(String [] args) throws Exception
	{

		final int r = 1024;
		final int c1 = r;
		final int c2 = r;
		AparapiMatMul ap = new AparapiMatMul(r, c1, c2);

		try {
		ap.setExecutionMode(Kernel.EXECUTION_MODE.GPU);
		long  time1 = System.currentTimeMillis();
		ap.execute(r * c2);
		System.out.println("Time taken for kenel execution in "+ ap.getExecutionMode()+ " mode is :"+ (System.currentTimeMillis() - time1));
		}catch(NullPointerException ne){
			ne.printStackTrace();
		}
		//ap.printResults();
		long time1 = System.currentTimeMillis();
		ap.normalMatMulCalc();
		System.out.println("Time taken for kenel execution in Sequential CPU mode is :"+ (System.currentTimeMillis() - time1));
		ap.compareResults();
		ap.dispose();
	}
}

class AparapiMatMul extends Kernel {

	float matA[];
	float matB[];
	float matC[];
	float C[];

	int rows ;
	int cols1;
	int cols2;

	@Override
	public void run() {
		int i = getGlobalId() /rows;
		int j = getGlobalId() % rows;
		float value = 0;
		for(int k = 0; k < cols1; k++)
		{
			value += matA[k + i * cols1] * matB[k * cols2 + j];
		}
		matC[i * cols1 + j] = value;
	}

	public AparapiMatMul(int r, int c1, int c2)
	{

		rows = r;
		cols1 = c1;
		cols2 = c2;

		matA = new float [r * c1];
		matB = new float [c1 * c2];
		matC = new float [r * c2];
		C = new float[r * c2];
		//matC should be initialized with zeros
		for(int i = 0; i < r; i++ )
		{
			for(int j = 0 ; j < c1; j++ )
			{
				matC[i * c1 + j ] = 0;

			}
		}

		//Here matrix A is initialized with random numbers

		for(int i = 0; i < r; i++ )
		{
			for(int j = 0 ; j < c1; j++ )
			{
				matA[i * c1 +j] = new Random().nextFloat();
			}
		}

		// Here matrix B is initialized with random numbers

		for(int i = 0; i < r; i++ )
		{
			for(int j = 0 ; j < c1; j++ )
			{
				matB[i * c2 + j] = new Random().nextFloat();
			}
		}

	}

	public void printResults()
	{
		for(int i = 0; i < rows; i++ )
		{
			for(int j = 0 ; j < cols2; j++ )
			{
				System.out.print(matC[i * cols2 + j]+"    ");
			}
		}
	}

	public void normalMatMulCalc()
	{
		System.out.println();
		System.out.println("Sequential Execution on CPU");
		 for(int i = 0;i < rows; i++)
			{
				for(int j = 0; j < cols2; j++)
				{
					float sum = 0;
					for(int k = 0; k < cols1; k++)
					{
						sum += matA[i*cols1+k] * matB[k*rows+j];
					}
				    C[i * cols2 + j] = sum;
				}

			}
	}
	public void compareResults()
	{
		boolean equal = true;
		for(int i = 0; i < rows * cols2 ; i++)
		{
			if(matC[i] != C[i])
			{
				equal = false;
				break;
			}
		}
		if(!equal)
			System.out.println("Results are not equal");
		else
			System.out.println("Results are equal.. Tested thoroughly!!!");
	}

}

The results I got are amazing.. I’m posting the results I got in my PC having AMD Radeon 5670 Graphics card.

Output:  GPU Mode

Time taken for kenel execution in GPU mode is :    838
Sequential Execution on CPU
Time taken for kenel execution in Sequential CPU mode is :   13335
Results are equal.. Tested thoroughly!!!

Output: JTP Mode

Time taken for kenel execution in JTP mode is :5671
Sequential Execution on CPU
Time taken for kenel execution in Sequential CPU mode is :13516
Results are equal.. Tested thoroughly!!!

Advertisements
  1. November 20, 2011 at 4:00 pm

    Vasanth
    Thanks for posting this and for your evaluation of Aparapi. I am the Aparapi tech lead/architect and it is great to see folks giving Aparapi a try.

    The results are slightly lower than I would have expected.

    When I looked at the code I discovered that you were using kernel.execute(c,r) (column by row). This is a reasonable choice given that you are familiar with OpenCL 😉 because you probably assumed this mapped to clExecuteNDRangeKernel with a 2 dims. Sadly Aparapi does not support this mode. Instead execute(c,r) is essentially invoking the Kernel r times, and we are accumulating the Kernel execution costs (not buffer txfers costs).

    I have included a slightly modified form of the code which calls execute(c*r) and a slightly modified Kernel.run method which is called once.

    I like your example and would like to include it on the apapapi.googlecode.com with your permission. Even use it as a sample/example project. Let me know if this would be OK by you. I will obviously credit you as the originator and link to your blog

    Here is my modified run() method

    public void run() {
    int i = getGlobalId()/rows; // was getGlobalId()
    int j = getGlobalId()%rows; // was getPassId();
    float value = 0;
    for(int k = 0; k < cols1; k++)
    {
    value += matA[k + i * cols1] * matB[k * cols2 + j];
    }
    matC[i * cols1 + j] = value;
    }

    And here is my modified execution.
    ap.execute(r*c2);

    For me (on a laptop) the #'s are

    GPU: 2688
    JTP: 8376
    REFERENCE:19690

    Would you mind trying this version to see if it performs better for you?

    Gary

    • November 21, 2011 at 5:03 am

      Thanks Gary.. Thank you very much for making changes to my code..
      You can use my code for the samples. I’ve no problems.

      I corrected the code

      import java.util.Random;
      import com.amd.aparapi.Kernel;

      /**
      * @author Vasanth Raja Chittampally
      */

      public class AparapiMatrixMultiplication {
      public static void main(String [] args) throws Exception
      {

      final int r = 1024;
      final int c1 = r;
      final int c2 = r;
      AparapiMatMul ap = new AparapiMatMul(r, c1, c2);

      try {
      long time1 = System.currentTimeMillis();
      //ap.setExecutionMode(Kernel.EXECUTION_MODE.JTP);
      ap.execute(r * c2);
      System.out.println(“Time taken for kenel execution in “+ ap.getExecutionMode()+ ” mode is :”+ (System.currentTimeMillis() – time1));
      }catch(NullPointerException ne){
      ne.printStackTrace();
      }
      //ap.printResults();
      long time1 = System.currentTimeMillis();
      ap.normalMatMulCalc();
      System.out.println(“Time taken for kenel execution in Sequential CPU mode is :”+ (System.currentTimeMillis() – time1));
      ap.compareResults();
      ap.dispose();
      }
      }

      class AparapiMatMul extends Kernel {

      float matA[];
      float matB[];
      float matC[];
      float C[];

      int rows ;
      int cols1;
      int cols2;

      @Override
      public void run() {
      int i = getGlobalId()/rows;
      int j = getGlobalId()%rows;
      float value = 0;
      for(int k = 0; k < cols1; k++)
      {
      value += matA[k + i * cols1] * matB[k * cols2 + j];
      }
      matC[i * cols1 + j] = value;
      }

      public AparapiMatMul(int r, int c1, int c2)
      {

      rows = r;
      cols1 = c1;
      cols2 = c2;

      matA = new float [r * c1];
      matB = new float [c1 * c2];
      matC = new float [r * c2];
      C = new float[r * c2];
      //matC should be initialized with zeros
      for(int i = 0; i < r; i++ )
      {
      for(int j = 0 ; j < c1; j++ )
      {
      matC[i * c1 + j ] = 0;

      }
      }

      //Here matrix A is initialized with random numbers

      for(int i = 0; i < r; i++ )
      {
      for(int j = 0 ; j < c1; j++ )
      {
      matA[i * c1 +j] = new Random().nextFloat();
      }
      }

      // Here matrix B is initialized with random numbers

      for(int i = 0; i < r; i++ )
      {
      for(int j = 0 ; j < c1; j++ )
      {
      matB[i * c2 + j] = new Random().nextFloat();
      }
      }

      }

      public void printResults()
      {
      for(int i = 0; i < rows; i++ )
      {
      for(int j = 0 ; j < cols2; j++ )
      {
      System.out.print(matC[i * cols2 + j]+" ");
      }
      }
      }

      public void normalMatMulCalc()
      {
      System.out.println();
      System.out.println("Sequential Execution on CPU");
      for(int i = 0;i < rows; i++)
      {
      for(int j = 0; j < cols2; j++)
      {
      float sum = 0;
      for(int k = 0; k < cols1; k++)
      {
      sum += matA[i*cols1+k] * matB[k*rows+j];
      }
      C[i * cols2 + j] = sum;
      }

      }
      }
      public void compareResults()
      {
      boolean equal = true;
      for(int i = 0; i < rows * cols2 ; i++)
      {
      if(matC[i] != C[i])
      {
      equal = false;
      break;
      }
      }
      if(!equal)
      System.out.println("Results are not equal");
      else
      System.out.println("Results are equal.. Tested thoroughly!!!");
      }

      }

      Results I got are amazing

      Output – I: GPU Mode
      Time taken for kenel execution in GPU mode is :914
      Sequential Execution on CPU
      Time taken for kenel execution in Sequential CPU mode is :12389
      Results are equal.. Tested thoroughly!!!

      Output -II: JTP Mode

      Time taken for kenel execution in JTP mode is :6564
      Sequential Execution on CPU
      Time taken for kenel execution in Sequential CPU mode is :12285
      Results are equal.. Tested thoroughly!!!

  2. Alex
    January 10, 2012 at 4:31 pm

    Hello,
    Thanks a lot for your code for the matrix multiplication.
    I don’t understand those 2 lines actually :

    int i = getGlobalId()/rows; // was getGlobalId()
    int j = getGlobalId()%rows; // was getPassId();

    I dont understand why you use division for i, and modulo operator for j.
    Could you please explain it to me ?
    Thanks in advance

    • January 11, 2012 at 7:44 am

      Those two lines are to find the corresponding row and column of a particular element in the array.

      • January 11, 2012 at 4:12 pm

        So as an example if a one dimensional array represented a grid of 5 rows x 10 columns. That’s a one dimensional array of 5×10 = 50.

        For any linear id (say 13) we can convert this to an x,y coord using.

        y = 13/5; // 2
        x = 13%5; // 3

        gary

  3. Alex
    January 23, 2012 at 10:57 am

    You example works if each matrix is (n,n). If you test your code with, say, matA (2,5) and matB (5,3), it does not work, the result is not correct because you don’t use the right variable between rows, cols1 and cols2.
    Here is the correct code for your kernel :

    int i = getGlobalId() /cols2;
    int j = getGlobalId() % cols2;
    float value = 0;
    for(int k = 0; k < cols1; k++)
    {
    value += matA[k + i * cols1] * matB[k * cols2 + j];
    }
    matC[i * cols2 + j] = value;

  4. Yuri
    March 29, 2014 at 12:27 am

    I’m learning Aparapi and find this article very useful as a first step into learning the ways to pass the heavy work (some tasks) to the GPU using Java.

    Thanks for the comments, it helps me to know what’s happening in the code, like in :

    execute(r,c2)
    execute(r *c2)

    and the

    int i = getGlobalId() /cols2;
    int j = getGlobalId() % cols2;

  5. December 1, 2014 at 8:23 am

    I’m happy to find this article! It’s really useful. By the way, I run your code but I cannot get GPU result, so only CPU result. Is there any problem with the code or should I install something? I’m using processor: Intel(R) Core(TM) i5-3570 CPU@3.40Hz 3.40Hz

    Here’s my result:
    Time taken for kenel execution in GPU mode is :1919

    Sequential Execution on CPU
    Time taken for kenel execution in Sequential CPU mode is :15103
    Results are equal.. Tested thoroughly!!!

    I hope you don’t mind to reply me, thank you!

    • December 31, 2014 at 8:43 am

      Hi,

      You did not mention which GPU you have. The results might vary with the underlying hardware.

  1. October 12, 2013 at 6:02 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: