PDA

View Full Version : OpenCL: Vector Add [Updated Sep 04 2011]



Petr Schreiber
29-11-2009, 11:21
Hi,

as you probably know, OpenCL is new standard for parallel computing. It is still in baby stage (specification ready, but implementations appear slowly). Unlike CUDA or STREAM, it is not vendor specific technology, and at the moment you can run it on NVIDIA, ATi and S3 hardware.

OpenCL allows you to execute C-like code on GPU, automatically parallelized. That means you can get very brutal speed boost for your time critical parts of code, without need to go assembler route.

It seems ideal for any vector operations, image processing and other heavy tasks.

If you own GeForce 8xxx and up or Radeon HD 4xxx and up with OpenCL enabled drivers, you can try attached example code on summing vectors. Older cards do not support this, and probably never will.

I provide adaptation of code from NVIDIA OpenCL Jump Start Guide. This guide is not bad introduction to OpenCL, except it contains quite a few typos and mistakes. The code I provide should be working adaptation of "OpenGL Host Code" and performs vector addition.

Code could be further optimized, but as-is it could give you idea how OpenCL coding works.

If you have the hardware, I would appreciate any input from your side (works/doesn't work, problems with headers, ...)

Important note: The code requires the OpenCL headers (http://www.thinbasic.com/community/showthread.php?10159-OpenCL-Headers-Updated-Sep-04-2011).


Petr

Lionheart008
29-11-2009, 15:20
it's a pity.


If you own GeForce 8xxx and up or Radeon HD 4xxx and up with OpenCL enabled drivers, you can try attached example code on summing vectors. Older cards do not support this, and probably never will.

my old nvidia card (Geforce 440mx..) cannot use this new open cl feature. the "opencl.dll" is missing too or it's delivered with new opencl graphic driver by nvidia ? one day I will buy a newer system, but why I should ? I am not a gamer ;)

best regards, frank

Petr Schreiber
29-11-2009, 15:32
Hi Frank,

sadly the older cards were built like graphic cards and not as general purpose computing devices.
The listed cards are more mature in design, and programming them can bring you very, very significant speed boosts.

If you do image processing or other intensive calculations, OpenCL is very interesting route to take for better performance while using high level code and letting CPU relax. Gaussian blur is realtime thing for 1920x1080 with OpenCL, try to use your graphic editor and watch how long it takes to do it.

The key to this speed boost is not brute force approach, but massive parallelization.

OpenCL.DLL is installed with the drivers of mentioned graphic cards.


Petr

kryton9
30-11-2009, 09:15
Thanks Petr for working on this cutting edge stuff. I looked at OpenCL a little, but to avoid confusing my confused mind any further I stopped.
I was looking at the specs for the new nvidia 300 series video cards. I think they will have at least 16 cuda cores and I saw a number for
200+ stream processing cores, not sure what the difference between the two is, but either way it sure seems like it will make OpenCL run even faster.

Petr Schreiber
30-11-2009, 09:56
Kent,

do you think, once you have time, you could try to install latest 195.62 WHQL drivers and try to run this example?

OpenCL is not something complicated. Take for example te CL code in the attachement above:

So what does happen here?

You first create context for GPU device, then enumerate how many GPUs and pick the first one.

Then you initialize array A and B on the CPU with some data.

Then you create mapping of those + result array C to the GPU.

Then you execute kernel - think of it as very light weight thread.


__kernel void
vectorAdd(__global const float * a, __global const float * b, __global float * c)
{
// Vector element index
int nIndex = get_global_id(0);
c[nIndex] = a[nIndex] + b[nIndex];
}


You can understand it +/- as:


KERNEL SUB vectorAdd(a AS SINGLE PTR, b AS SINGLE PTR, c AS SINGLE PTR)
' Vector element index
LOCAL nIndex AS LONG = get_global_id(0)
c(nIndex) = a(nIndex) + b(nIndex)
END SUB


As you can see, currently the program spreads to as many HW cores as possible, so each array cell is processed independently. "get_global_id(0)" retrieves the index, and "c(nIndex) = a(nIndex) + b(nIndex)" simply puts sum of A, B to C.

Then you just read back the results. Very simple idea, not so complicated code ... and you have it working :)


Petr

kryton9
30-11-2009, 19:58
Thanks Petr, will do.

I PM'd you about installing the latest thinBasic on my programming computer, but that runs the built in intel graphics, so ignore that part and I will install on my gaming pc which has nvidia. I will run the test tonight when I get home and put up the results. Thanks for the overview.

kryton9
01-12-2009, 04:41
Your example ran fine Petr. There was no benchmark, but it did do all the vector math all the way through with no errors I am happy to report!!

Petr Schreiber
01-12-2009, 07:56
Thank you very much Kent,

this was not benchmark example yet, I just wanted to know if it runs ... and it seems it did, which is good to hear!

Lionheart008
02-12-2009, 16:54
hi petr, perhaps I can test your openCL example at school. how big is your "opencl.dll" ? can you send this file to me as e-mail (zip-file)? would be nice. perhaps I can test your example above at one of newer machine and graphic cards at school ;)

frank

Petr Schreiber
02-12-2009, 16:58
Hi Frank,

the OpenCL.DLL can only come with graphic drivers. Its interface is always the same, but the implementation differs for each vendor. So it cannot be copied from PC to PC.

If you have some PC with GeForce and ForceWare 195.62 in school, it would be nice to try it.
But it is still very fresh technology, so it is not present at many PCs at the moment.

But thanks for the offer :)

Petr Schreiber
02-12-2009, 20:45
I updated the example,

there was missing release of some of the resources, and I extended the error decoder as well.

Lionheart008
03-12-2009, 22:06
hi petr,


f you have some PC with GeForce and ForceWare 195.62 in school, it would be nice to try it. But it is still very fresh technology, so it is not present at many PCs at the moment.

at school I have noticed I am working with GeForce 7050 and NVIDIA nForce 630i. that's ok for opencl testing ? so I will do next day.

best regards, frank

Petr Schreiber
04-12-2009, 00:54
Hi Frank,

GeForce 7050 is one generation older than we need.

As I wrote on multiple places, GeForce 8xxx and up must be used, and your school case is "just" 7xxx.
So I am sure it will not run.

But thanks for checking the GPUs in the school :) We have only Intels here, and just one ATi monster.


Petr

kryton9
04-12-2009, 04:42
The new version ran fine Petr, thanks.

Petr Schreiber
09-02-2010, 10:33
Updated the code to more specification compliant form.