Iirc the memory layout was in 4kb blocks and the scanlines are interleaved between each block. So the first 320 pixels (80 bytes) come from B800:0000 and then the next scanline's 320 pixels (80 bytes) would come from BA00:0000. The third scanline goes back to B800 again with 320 pixels (80 bytes) starting at B800:0050. The easiest way I've found to do it is to make an array for the base offsets and just swap between them each scanline.
Fun fact, the same layout also works for Tandy 16 color graphics modes! In that case a scanline is 160 bytes long (to make 320 pixels) and you use the segments B800, BA00, BC00, and BE00 sequentially. Like first scanline is B800, second is BA00, third is BC00, then BE00 before returning to B800 for scanline 5.
Increment your offset whenever you return to B800.
This does mean that in CGA graphics modes 4 and 5 you're stuck blitting in blocks of N*4 pixels, as the pixels are packed 4 to a byte. Also keep in mind the order of pixels in the byte are probably going to be the inverse order that you expect them to be. CGA was very much a case of hardware engineering considerations first, software engineering considerations second (if at all).