Medical image segmentation is a cornerstone of modern clinical diagnostics, enabling precise delineation of anatomical structures and pathologies. According to a comprehensive survey published on arXiv by Zhu, Pengyu, Zhang, Xiaojing, Kunbo, Chunyan, and Wang, Zhenyu, the field has undergone systematic development driven by deep learning. The survey organizes methods built on three major architectures—U-Net, Transformer, and SAM—within a unified analytical framework, with a particular focus on their effectiveness in improving segmentation accuracy and efficiency.
U-Net-Based Methods
U-Net remains a foundational backbone for medical image segmentation due to its symmetric encoder-decoder structure and skip connections. The survey notes that U-Net-based methods are widely adopted for tasks ranging from organ segmentation to lesion detection. These methods benefit from a large body of public datasets and have demonstrated strong performance on benchmarks. However, the survey also highlights limitations in capturing long-range dependencies, which led to the exploration of Transformer-based alternatives.
Transformer-Based Methods
Transformers, originally developed for natural language processing, have been adapted for vision tasks, including medical image segmentation. The survey reviews representative Transformer-based approaches that leverage self-attention mechanisms to model global context. These methods often outperform U-Net variants on complex segmentation tasks, particularly where fine-grained boundaries or multi-scale features are critical. The authors note that Transformers, while powerful, require larger datasets and more computational resources, posing challenges for clinical deployment.
SAM-Based Methods
The Segment Anything Model (SAM) represents a recent paradigm shift toward foundation models for segmentation. The survey examines how SAM-based methods are being fine-tuned or adapted for medical imaging. Early results indicate strong zero-shot and few-shot capabilities, but the survey underscores that domain-specific adaptation remains necessary for clinical-grade accuracy. SAM models are still under active investigation for tasks like tumor segmentation and anatomical structure retrieval.
Challenges and Benchmarks
The survey identifies several persistent challenges: limited annotated data, domain shift across imaging modalities, class imbalance, and the need for real-time inference in clinical settings. It also enumerates widely used public datasets and evaluation metrics, pointing out differences between metrics like Dice coefficient, Hausdorff distance, and intersection over union (IoU). A key conclusion is that no single architecture universally dominates; the choice depends on the specific clinical application and available data.
Implications for Enterprise Technology Leaders
While the survey is focused on healthcare AI, the architectural innovations and benchmarking methodologies carry lessons for enterprise technology leaders in sectors such as logistics and supply chain. The ability to segment complex images—whether medical scans or satellite freight imagery—relies on similar deep learning paradigms. The open-source resources shared on the survey's GitHub repository enable rapid prototyping, which can accelerate clinical translation and cross-industry adoption. The survey aims to guide future research and support clinical translation, with all related resources publicly available in the authors' GitHub repository.