There are surely things that can be optimized further. But I think its fine where it is at now.
Float additions are fast enough and now there is no division inside of any of the loops.
If I was to do more optimization I would probably look for other more low hanging fruits in the code.