I'm not sure where the original algorithm came from. I think we used it as some kind of stress test - a most appalling misuse of recursion
Your method on the other hand is a practical and efficient way of doing it iteratively, maxing the use of registers.
But The Fibonacci number, unlike Pi can also be calculated directly (if you consider square roots direct)
I reckon it will take the FPU at least 100 clocks to resolve the square root but the precision will be very good.