
I made this chip with that algorithm. First, at left there are NOT gates (with blue outputs) they calculate negative of divider without adding one (that one add with carry input) . And the bottom there are OR and NOT gates (with yellow outputs) they calculate overflow for example 1101 divide by 0011 with 3 shift: 1101-0011000 but that 001 can't be in calculation so if there are any 1's in 001 so it's overflow. It can't be subtract. And if it can be subtract so we subtract for next steps, I do it with MUX. I think it can help you with your project but maybe you can optimize it. Because it works with 1000 ticks per second (In normal mode 250-1000) but maybe in CPU's it can slow down CPU much more.