why would you have a division at all
sprite height and width are constants - compute that aspect ratio outside of the x loop, multiply by 256 to keep some precision (eg 24:8 fixed point) and then use multiply and shift with the core y loop
There are surely things that can be optimized further. But I think its fine where it is at now.
Float additions are fast enough and now there is no division inside of any of the loops.
If I was to do more optimization I would probably look for other more low hanging fruits in the code.