Light-sheet fluorescence microscopy (LSFM) is a minimally invasive and high throughput imaging technique ideal for capturing large volumes of tissue with sub-cellular resolution. A fundamental requirement for LSFM is a seamless overlap of the light-sheet that excites a selective plane in the specimen, with the focal plane of the objective lens. However, spatial heterogeneity in the refractive index of the specimen often results in violation of this requirement when imaging deep in the tissue. To address this issue, autofocus methods are commonly used to refocus the focal plane of the objective-lens on the light-sheet. Yet, autofocus techniques are slow since they require capturing a stack of images and tend to fail in the presence of spherical aberrations that dominate volume imaging. To address these issues, we present a deep learning-based autofocus framework that can estimate the position of the objective-lens focal plane relative to the light-sheet, based on two defocused images. This approach outperforms or provides comparable results with the best traditional autofocus method on small and large image patches respectively. When the trained network is integrated with a custom-built LSFM, a certainty measure is used to further refine the network's prediction. The network performance is demonstrated in real-time on cleared genetically labeled mouse forebrain and pig cochleae samples. Our study provides a framework that could improve light-sheet microscopy and its application toward imaging large 3D specimens with high spatial resolution.