Results 1 to 5 of 5

Thread: Information about new OpenCL headers

  1. #1
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,153
    Rep Power
    736

    Information about new OpenCL headers

    Dear friends!,

    recent advances in ThinBASIC syntax allowed me to remaster the OpenCL headers with, I hope, the most faithful translation of original Khronos 1.1 headers possible.

    To start enjoying the technology, you will need:


    Then you can download updated versions of OpenCL examples:


    The examples and headers have been extensively tested on 3 different GPUs - GeForce G210M, GeForce GT320 and Quadro FX1800M (thanks Eros!) - but as usually with GPGPU computing stuff, be careful


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  2. #2
    Senior Member zak's Avatar
    Join Date
    Dec 2008
    Posts
    637
    Rep Power
    83
    Hi Petr
    my geforce 7 damaged recently from excessive heat during long process in my desktop pc, and now i am using the poor performance onboard graphics , so i will buy another card to be able to run your examples. my question: are all advanced graphics cards using PCI-E ?, since my damaged card was inserted in a slot PCI-E. there are also two PCI slots.
    or it may use another slots wich i don't know.

  3. #3
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,153
    Rep Power
    736
    Hi Zak,

    I am sorry to hear your GeForce 7 is down, I liked these series. Regarding upgrade - yes, PCI-E seems to be current standard for new GPUs. There is observable speed difference between PCI-E 1x and PCI-E 4x, but the higher speeds (16x) do not bring much more power comparing to 4x.

    For OpenCL experiments, even the cheapest GeForce G210 is fine, but for some serious performance I would recommend anything higher. The key parameter for parallel performance is number of CUDA cores. Here the range is incredible. My G210M has 16 of them, but I have at home non-reference version of GTX260 which has 224 CUDA cores, and there are 512 core monsters on the market at the moment. I would say anything with 48+ CUDA cores should offer some interesting performance both for graphics and computing programming.

    OpenCL 1.0 is supported on GeForce 8, 9, 2xx, 3xx series, the OpenCL 1.1 should be supported on GeForce 4xx, 5xx series and up.

    At the moment I am investigating the situation in AMD Radeon land. Interesting is that from all my friends only 1 (one) has Radeon GPU which makes it a bit difficult


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  4. #4
    thinBasic MVPs kryton9's Avatar
    Join Date
    Nov 2006
    Location
    Naples, Florida & Duluth, Georgia
    Age
    68
    Posts
    3,865
    Rep Power
    405
    Petr, I read the article you linked to above and it was a really great intro for coming to graps with how opencl is designed. One thing I did not understand is how he broke the work items and work groups down... is this decided by the number of cores available?

  5. #5
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,153
    Rep Power
    736
    Hi Kent,

    I am happy to see the article got your interest For many tasks, you can let the OpenCL to do this division for you automagically, driver will take care of it.
    Especially during the learning process, this is good helper so you can focus on other things.

    When pushing the performance to the edge, the extensive use of local memory is good idea. This is where you might need to take more control of precise division to work groups, to not run out of local memory, which is often as small as few tens of KB per work item, comparing to hundreds of MB of slower global memory.

    Here the author needed to make sure two groups are created (mostly to demonstrate the cooperation inside workgroup), so he forced it by telling to OpenCL specific wish.
    You can see in the kernel code, that until reaching the barrier (= having the mini-sums ready), the output is written just and only to fast local memory. And after this is done, the whole group data is summed by first work-item in each group, and passed to global memory so it can be read back.

    If you want, you can check the number of cores yourself (please see example) and then arrange the "topology" of the calculation yourself. But if you don't do, the driver again takes care of all the necessary operation, so if the work group pattern does not match the hardware, or if the problem is bigger than number of cores on the GPU, no explosion occurs and computation will run.


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

Similar Threads

  1. OpenCL: Headers [Updated Sep 15 2011]
    By Petr Schreiber in forum OpenCL
    Replies: 1
    Last Post: 15-09-2011, 11:00
  2. OpenCL: Device information [Updated Sep 04 2011]
    By Petr Schreiber in forum OpenCL
    Replies: 14
    Last Post: 09-02-2010, 14:09

Members who have read this thread: 0

There are no members to list at the moment.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •