I have a tip for You , because Your project will be big the optimisation will be helpful
Add 3 state buffer before operation logic , why it will help because when You add data on data bus simulation in first step calculate all gate
(for example one gate one tick now you have 5 * 16 = 80 ticks but if you will only one You will need only 16 ticks )
And one more AND gate need 2 NAND but You can use one 3 state
on screenshot You can see how I realize it
first line 3 state allow data input second 3 state make AND gate logic
I hope it will be helpful