Maybe you noted the splash which ATi/AMD made with their Mantle renderer for Battlefield 4. It was basically a new API to expose functionality of the GPU in more efficient way. The efficiency is achieved by better structured API, to reduce CPU overhead.
The typical case of CPU bound approach in TBGL is using TBGL_BeginPoly/TBGL_EndPoly. This approach is present in TBGL because it has great educational value and allows you to jump in more easily than when using TBGL GBuffers, which are addressing the CPU overhead by reducing the number of calls and allowing to define large blocks of geometry... fast.
This can be, of course, pushed further. I will dedicate time to study the new approaches emerging right now - after 2 very boring years in GPU industry this catched my attention. I am reading through the docs, I subscribed to DirectX 12 early access to get inspired from "the other side" too.
While Mantle seems to perform well, I didn't manage to get any documentation for it, and as now DirectX and OpenGL demonstrate their ability to achieve similar results, I think I will drop this technology from my watch list (which is the case with ATi too often, remember their CTM technology for GPGPU?).
On the other side, as I declared many times before. I do not like the direction current GPU APIs follow currently. Instead of providing easy to use interface, they get more and more low level. I will try to compensate this with TBGL making it simple for us, poor mortals.
Further reading
Briefly on Mantle speed
From OpenGL perspective
From Direct3D perspective
Bookmarks