Do you think it could be substantially optimized with more work or is this about the limit?
There is no limit to perfection. Most likely, by writing code in ASM, can achieve good results.