Zeng and Zheng (2018) introduced “Holistic Decomposition Convolution” that-when added to a conventional 3D U-Net-significantly reduced the size of the input data while maintaining the useful information for the semantic segmentation. In the second stage of the prediction, the probabilities for overlapping tile predictions were averaged to produce a better Dice Coefficient result. performed abdominal organ segmentation on 512 × 512 CT images with between 460 and 1,177 slices by using input tiles of size 132 × 132 × 116 to yield output prediction tiles of 44 × 44 × 28 in a Cascaded 3D U-Net ( Roth et al., 2018). Tiling introduces additional model hyperparameters-namely, tile size, overlap amount, and aggregation process (e.g., tile averaging/rounding)-that must be tuned to generate better predictions. To perform the overlapping tiling at inference time, varying N × N (or in the 3D case, N × N × N) tiles are cropped from the whole image at uniformly spaced offsets along the image dimensions. Fully convolutional networks are a natural fit for tiling methods, as they can be trained on images of one size and perform inference on images of a larger size by breaking the large image into smaller, overlapping tiles ( Ronneberger et al., 2015 Çiçek et al., 2016 Roth et al., 2018). These activation maps can easily increase the allocated memory to hundreds of gigabytes. Specifically, in CNN models, the activation maps of the intermediate layers use several times the memory footprint of the original input image. Tiling is often applied when using large images due to the memory limitations of the hardware ( Roth et al., 2018). Two methods are commonly used to manage these memory limitations: (i) images are often down-sampled to a lower resolution, and/or (ii) images are broken into smaller tiles ( Huang et al., 2018 Pinckaers and Litjens, 2018). However, memory constraints in deep learning accelerator cards have often limited training on large 2D and 3D images due to the size of the activation maps held for the backward pass during gradient descent ( Chen et al., 2016 Ito et al., 2019). Since their resurgence in 2012 convolutional neural networks (CNN) have rapidly proved to be the state-of-the-art method for computer-aided diagnosis in medical imaging, and have led to improved accuracy in classification, localization, and segmentation tasks ( Krizhevsky et al., 2012 Chen et al., 2016 Greenspan et al., 2016). Our results suggest that tiling the input to CNN models-while perhaps necessary to overcome the memory limitations in computer hardware-may lead to undesirable and unpredictable errors in the model's output that can only be adequately mitigated by increasing the input of the model to the largest possible tile size. Finally, we compare 2D and 3D semantic segmentation models to show that providing CNN models with a wider context of the image in all three dimensions leads to more accurate and consistent predictions. Here we quantify these variations in both medical (i.e., BraTS) and non-medical (i.e., satellite) images and show that training a 2D U-Net model on the whole image substantially improves the overall model performance. In this study, we show that this tiling technique combined with translationally-invariant nature of CNNs causes small, but relevant differences during inference that can be detrimental in the performance of the model. A fully convolutional topology, such as U-Net, is typically trained on down-sampled images and inferred on images of their original size and resolution, by simply dividing the larger image into smaller (typically overlapping) tiles, making predictions on these tiles, and stitching them back together as the prediction for the whole image. Limitations in computer hardware, most notably memory size in deep learning accelerator cards, prevent relatively large images, such as those from medical and satellite imaging, from being processed as a whole in their original resolution. 4Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United StatesĬonvolutional neural network (CNN) models obtain state of the art performance on image classification, localization, and segmentation tasks.3Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.2Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, United States.1Intel Corporation, Santa Clara, CA, United States.Anthony Reina 1, Ravi Panchumarthy 1, Siddhesh Pravin Thakur 2, Alexei Bastidas 1 and Spyridon Bakas 2,3,4 * †
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |