I would use objects (and Object Styles) to group captions and images together in one box, which should keep them together on epub export. It should wrap the image and the captions (and the classes specified in the Object Styles export section) in a set of divs that set them off from the text.
At some point, the images and the text might get separated, due to changing fonts/sizes, but for the most part, your image and caption should be together.
Others with more experience might join in, especially on finessing things, but this is the type of code I have in my current book:
<div id=”_idContainer087″ class=”figure _idGenObjectStyle-Disabled”>
<div id=”_idContainer085″ class=”figure”>

</div>
<div id=”_idContainer086″ class=”figure”>
<p class=”FigureCaption”>Figure 1: The eye is a lot like a camera. It focuses light with a lens, controls exposure with an iris, and has an imaging medium, the retina.</p>
</div>
</div>