Archive

Archive for August, 2011

Interview C puzzles


Here I want to share some of the simple & tricky C puzzles and interview questions I faced.

1) Write single line swapping function

Ans:  Check here for Python code

  Please check here for C code

2) Write single line macro to swap two numbers 


#include< stdio.h >

#define swap(a,b) b=(a+b)-(a=b)

int main(void)
{
    int a=12;
    int b=-34;
    printf("Before swapping a=%d, b=%d\n",a,b);
    swap(a,b);
    printf("After swapping a=%d, b=%d\n",a,b);
    return 0;
}

3) Write a macro to check whether a given number is a power of two or not


#include < stdio.h >

#define isPowOf2(x) ( ( x & ( x - 1 ) ) == 0) ? 1:0

int main(void)
{
    if(isPowOf2(4))
   	 printf("It is power of two\n");
    else
   	 printf("It is not power of two\n");
    if(isPowOf2(41))
   	 printf("It is power of two\n");
    else
   	 printf("It is not power of two\n");
    if(isPowOf2(64))
   	 printf("It is power of two\n");
    else
   	 printf("It is not power of two\n");
    if(isPowOf2(400))
   	 printf("It is power of two\n");
    else
   	 printf("It is not power of two\n");

    return 0;

}

4) Write code to swap two numbers with out using temporary variable and  using shifting operators

Ans: Please check here

5) Write a function to round off a given float/double to the nearest integer

#include < stdio.h >

int Myround(double n)
{
    return (int)(n+0.5);
}
int main(void)
{
    printf("%d\n",Myround(25.7));
    printf("%d\n",Myround(25.2));
    printf("%d\n",Myround(25.6));
    printf("%d\n",Myround(25.9));
    return 0;
}

5) Write a one line macro to round off a given float/double to the nearest integer

Ans: same as above
#define myround(a) (int)(a+0.5)

6) Write a function to print binary representation of a given integer

#include < stdio.h >

int binaryRep(int n)
{
    int i=0;
    for(i = sizeof(int)*8 - 1;i >= 0; i--)
    {
   	 if( ( n & (1 << i ) ) > 0 )
   		 printf("%d", 1);
   	 else
   		 printf("%d", 0);
   	 if(i%4==0)
   		 printf(" ");
    }
}

int main(void)
{
    binaryRep(10);
    printf("\n");
    binaryRep(100);
}

7) Write a macro to check whether a specified bit in a number is on or off

#include < stdio.h >

#define onROff(n,i) ( (n & (1 << i-1)) > 0 ) ? 1 : 0

int main(void)
{
    if(onROff(15,1))
   	 printf("first bit is on\n");
    else
   	 printf("first bit is off\n");

    if(onROff(15,2))
   	 printf("second bit is on\n");
    else
   	 printf("second bit is off\n");

    if(onROff(15,3))
   	 printf("third bit is on\n");
    else
   	 printf("third bit is off\n");

   return 0;
}

8 ) Write a macro to switch On a bit in a given number

#include < stdio.h >

#define switchOn(n ,i) n|=(1 << (i-1))

int main(void)
{
    int n=10;
    switchOn(n,1);
    printf("After switching on first bit of n=%d\n",n);
    switchOn(n,3);
    printf("After switching on third bit of n=%d\n",n);
}

9) Write a macro to switch Off a bit in a given number


#include < stdio.h >

#define switchOfff(n,i) n = n & (~ (1 << (i-1) ) )

int main(void)
{
    int n=15;
    switchOfff(n,1);
    printf("After switching off first bit n=%d\n", n);
    switchOfff(n,2);
    printf("After switching off second bit n=%d\n", n);
    switchOfff(n,3);
    printf("After switching off third bit n=%d\n", n);
    switchOfff(n,4);
    printf("After switching off fourth bit n=%d\n", n);
    return 0;
}

Some of the following questions can be applications of the above specified bit wise operators and other programs specified in this blog. Some of them cab be easily found in the web. I wish all the best for your technical C interview.

10) Write a macro to reverse a bit at a given position

11) Write a function to display binary representation of a given floating/double number

12) Write a function to find the largest sum of  a sub sequence of an array.

ex: { 1 , – 4, 5, 7}

in the above example the highest sum sub sequence is {1,5,7}. If all the elements in the array are +ve then  the whole array itself is the largest sum subsequence. But the given array contains some -ves also.

13) Write a program to mirror a binary tree

ex: Here we have to mirror the whole left sub tree with right sub tree and right sub tree with the left one

14) Write DFS with out using recursion

15) How best (time complexity in Big – Oh )you can write an algorithm which finds the first repeated value in a sorted  array.

16) Write an algorithm which sorts an array with time complexity less than O(nlogn) 

Hint: Take some assumptions and solve

17) Write an algorithm for finding a given string in a file and replacing it with a replacement string

ex:    findAndReplace(FILE *fp, char * findString, char * replaceString)

18) How to find size of a structure / data type with out using sizeof operator

Hint: Use pointers and arithmetic operations principles

19) How do you develop a object collector in C (like one in Java)

20) Find the first repeating character in a string. Constraint use only O(n) time complexity and O(1) space complexity

21) Write BFS using recursion

22) Find the middle of a linked list

23) How do you find last but one and third from last in a linked list

24) How do you find a loop in a linked list

Hint: For the above three linked list questions you can use the concept of fast * pointer and slow * pointer

25) Write Ceil function with out using any comparison operators

26) Reverse a linked list using recursion and with out using recursion

27) Longest continuous sequence

28) All the subsets in a given array (Out put should have 2 ^ n entries)

Advertisements

OpenCL GPU Matrix multiplication program


Here is the code to multiply two matrices using heterogeneous system programming language OpenCL. The reason being called OpenCL as heterogeneous is that written a code in OpenCL can be ported to CPU or GPU or Cell processor.


//Author: Vasanth Raja
//Program to multiply two matrices using OpenCL in GPU

#include "stdafx.h"

#include < stdio.h >
#include < stdlib.h >
#include < time.h >
#include < ctime >

#define widthA 128
#define heightA 128

#define widthB heightA
#define heightB 128

#define widthC widthA
#define heightC heightB

#ifdef __APPLE__
#include < OpenCL/opencl.h >
#else
#include < CL/cl.h >
#endif

#define MEM_SIZE (128)
#define MAX_SOURCE_SIZE (0x100000)

int main()
{
  float * A = (float *)malloc(sizeof(float)*widthA*heightA);
  float * B = (float *)malloc(sizeof(float)*widthB*heightB);
  float * C = (float *)malloc(sizeof(float)*widthC*heightC);
  float * Res = (float *)malloc(sizeof(float)*widthC*heightC);
  float * D= (float *)malloc(sizeof(float)*widthC*heightC);

   FILE * fp1 = fopen("matAdata.txt", "w");
  if (!fp1) {
    fprintf(stderr, "Failed to open matAdata.\n");
    exit(1);
   }

  for(int i = 0;i < widthA; i++)
  {
		for(int j=0;j		{
			float p=(rand()%100)/7.0;
			*(A+i*heightA+j)=rand()%100 + p;
			fprintf(fp1, "%f ",*(A+i*heightA+j));
		}
		fprintf(fp1, "\n");
   }
   fclose(fp1);

   fp1 = fopen("matBdata.txt", "w");
   if (!fp1) {
    fprintf(stderr, "Failed to open matAdata.\n");
    exit(1);
   }

	for(int i = 0;i < widthB; i++)
	{
		for(int j=0; j		{
			float p=(rand()%100)/7.0;
			*((B+i*heightB+j))=rand()%100 + p;
			fprintf(fp1, "%f ",*(B+i*heightA+j));
		}
		fprintf(fp1, "\n");
	}
	fclose(fp1);

  cl_device_id device_id = NULL;
  cl_context context = NULL;
  cl_command_queue command_queue = NULL;
  cl_mem memobjA = NULL;
  cl_mem memobjB = NULL;
  cl_mem memobjC = NULL;
  cl_mem rowA = NULL;
  cl_mem colC = NULL;
  cl_program program = NULL;
  cl_kernel kernel = NULL;
  cl_platform_id platform_id = NULL;
  cl_uint ret_num_devices;
  cl_uint ret_num_platforms;
  cl_int ret;

  //char string[MEM_SIZE];

  FILE *fp;
  char fileName[] = "./hello.cl";
  char *source_str;
  size_t source_size;
  int row = widthA;
  int col = heightC;
  /* Load the source code containing the kernel*/
  fp = fopen(fileName, "r");
  if (!fp) {
    fprintf(stderr, "Failed to load kernel.\n");
    exit(1);
  }
  source_str = (char*)malloc(MAX_SOURCE_SIZE);
  source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp);
  fclose( fp );

  /* Get Platform and Device Info */
  ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
  ret = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &ret_num_devices);

  /* Create OpenCL context */
  context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &ret);

  /* Create Command Queue */
  command_queue = clCreateCommandQueue(context, device_id, 0, &ret);

  /* Create Memory Buffer */
  memobjA = clCreateBuffer(context, CL_MEM_READ_WRITE, widthA * heightA * sizeof(float), NULL, &ret);
  memobjB = clCreateBuffer(context, CL_MEM_READ_WRITE, widthB * heightB * sizeof(float), NULL, &ret);
  memobjC = clCreateBuffer(context, CL_MEM_READ_WRITE, widthC * heightC * sizeof(float), NULL, &ret);
  rowA = clCreateBuffer(context, CL_MEM_READ_WRITE,  sizeof(int), NULL, &ret);
  colC = clCreateBuffer(context, CL_MEM_READ_WRITE,  sizeof(int), NULL, &ret);

  // Copy the lists A and B to their respective memory buffers
    ret = clEnqueueWriteBuffer(command_queue,memobjA, CL_TRUE, 0,
           widthA * heightA * sizeof(int), A, 0, NULL, NULL);
    ret = clEnqueueWriteBuffer(command_queue, memobjB, CL_TRUE, 0,
            widthB * heightB * sizeof(int), B, 0, NULL, NULL);
	ret = clEnqueueWriteBuffer(command_queue, rowA, CL_TRUE, 0, sizeof(int), &row, 0, NULL, NULL);
	ret = clEnqueueWriteBuffer(command_queue, colC, CL_TRUE, 0, sizeof(int), &col, 0, NULL, NULL);

  /* Create Kernel Program from the source */
  program = clCreateProgramWithSource(context, 1, (const char **)&source_str,
				                      (const size_t *)&source_size, &ret);

  /* Build Kernel Program */
  ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);

  /* Create OpenCL Kernel */
  kernel = clCreateKernel(program, "matrixMultiplication", &ret);

  /* Set OpenCL Kernel Arguments */
  ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjA);
  ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobjB);
  ret = clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&memobjC);
  //ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjA);
  ret = clSetKernelArg(kernel, 3, sizeof(int), (void *)&row);
  ret = clSetKernelArg(kernel, 4, sizeof(int), (void *)&col);
  /* Execute OpenCL Kernel */
  //ret = clEnqueueTask(command_queue, kernel, 0, NULL,NULL);
  size_t globalThreads[2] = {widthA, heightB};
  size_t localThreads[2] = {16,16};

  clEnqueueNDRangeKernel(command_queue, kernel, 2, NULL, globalThreads, localThreads, NULL, 0, NULL);
  /* Copy results from the memory buffer */
  ret = clEnqueueReadBuffer(command_queue, memobjC, CL_TRUE, 0,
			                widthA * heightC * sizeof(float),Res, 0, NULL, NULL);

  fp1 = fopen("matGPURes.txt", "w");
  if (!fp1) {
    fprintf(stderr, "Failed to open matAdata.\n");
    exit(1);
  }

  printf("\nResult\n");
	for(int i = 0;i < widthA; i++)
	{
		for(int j=0;j < heightC; j++)
		{

			fprintf(fp1, "%f ",*(Res+i*heightC+j));

		}
		fprintf(fp1, "\n");
	}
	fclose(fp1);

  ret = clFlush(command_queue);
  ret = clFinish(command_queue);
  ret = clReleaseKernel(kernel);
  ret = clReleaseProgram(program);
  ret = clReleaseMemObject(memobjA);
  ret = clReleaseMemObject(memobjB);
  ret = clReleaseMemObject(memobjC);
  ret = clReleaseCommandQueue(command_queue);
  ret = clReleaseContext(context);

  free(source_str);
  system("pause");

  float sum=0.0;

  for(int i = 0;i < widthA; i++)
	{
		for(int j = 0; j < heightC; j++)
		{
			sum = 0;
			for(int k = 0; k < widthB; k++)
			{
				sum += A[i*col+k] * B[k*row+j];
			}
		D[i*heightC+j] = sum;
		}

	}

    fp1 = fopen("matNormalMultiplicationRes.txt", "w");
  if (!fp1) {
    fprintf(stderr, "Failed to open matAdata.\n");
    exit(1);
  }

  printf("\nResult\n");
	for(int i = 0;i < widthA; i++)
	{
		for(int j=0;j < heightC; j++)
		{
			fprintf(fp1, "%f ",*(D+i*heightC+j));

		}
		fprintf(fp1, "\n");
	}
   system("pause");
  return 0;
}

You can check configuration and set up in Visual studio here.

The actual Kernel executed in the GPU is as follows.

__kernel
void matrixMultiplication(__global float* A, __global float* B, __global float* C,  int widthA, int widthB )
{
	int i = get_global_id(0);
	int j = get_global_id(1);
	float value=0;
	for ( int k = 0; k < widthA; k++)
	{
		value = value + A[k + j * widthA] * B[k*widthB + i];
	}
	C[i + widthA * j] = value;
}