Transformer-Based Models in Image Segmentation and Classification: A New Era in Vision AI

Atako, Nelson Rachael

PDF

Received: 10-Sep-2024
Accepted: 29-Apr-2025
Published: 01-Jul-2025

Keywords:

Transformer, Vision Transformer (ViT), Image Classification, Image Segmentation, Convolutional Neural Networks (CNN), Attention Mechanism, Hybrid Models

Atako, Nelson Rachael

Dept of Public Administration. University of Nigeria Nsukka. (Policy Analysis PhD in view)

Abstract

Over the past decade, deep learning has revolutionized computer vision, with
convolutional neural networks (CNNs) dominating tasks like image classification and segmentation.
However, a new paradigm emerged as transformer-based models – originally
developed for natural language processing – have begun to surpass previous CNN-based approaches
across vision tasks. This marks a new era in Vision AI, where transformers’ ability
to capture long-range dependencies and global context is reshaping how we design vision
systems. Transformer models have achieved state-of-the-art performance in image classification
(assigning labels to entire images) and segmentation (partitioning images into labeled
regions), often with simpler pipelines and stronger results than their CNN predecessors.

Downloads

Download data is not yet available.

How to Cite

Atako, Nelson Rachael. (2025). Transformer-Based Models in Image Segmentation and Classification: A New Era in Vision AI. Doupe Journal of Top Trending Technologies, 1(2), 36–45. Retrieved from https://www.doupe.in/index.php/ttt/article/view/15

Issue

Vol. 1 No. 2 (2025): Recent Advances and Emerging Trends in Computer Vision

Section

Articles

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details