Transformer-Based Models in Image Segmentation and Classification: A New Era in Vision AI
Main Article Content
Abstract
Over the past decade, deep learning has revolutionized computer vision, with
convolutional neural networks (CNNs) dominating tasks like image classification and segmentation.
However, a new paradigm emerged as transformer-based models – originally
developed for natural language processing – have begun to surpass previous CNN-based approaches
across vision tasks. This marks a new era in Vision AI, where transformers’ ability
to capture long-range dependencies and global context is reshaping how we design vision
systems. Transformer models have achieved state-of-the-art performance in image classification
(assigning labels to entire images) and segmentation (partitioning images into labeled
regions), often with simpler pipelines and stronger results than their CNN predecessors.