Bridge Crack Detection and Classification Using CrackDet-ViT: A Vision Transformer and CNN-Based Segmentation Framework

Authors

  • Jibin Jacob Mani Author

Keywords:

Structural Health, CNN, ViT, Instance Segmentation, Crack Detection, Deep Learning

Abstract

Bridge crack detection is a critical task in structural health monitoring. Traditional manual inspection methods suffer from inefficiency and issues such as false positives and missed detections. However, existing automated models still face limitations in handling complex backgrounds and multi-scale cracks. Therefore, there is a need for a high-accuracy crack detection method. In this paper, we propose CrackDet-ViT, a bridge crack detection and segmentation model that integrates RegNet, ViT, and Mask R-CNN. The model uses RegNet to extract local features, ViT to capture global information, and Mask R-CNN for crack object detection and pixel-level segmentation, thereby improving detection accuracy and segmentation performance. Experimental results show that CrackDet-ViT achieves a mean Average Precision (mAP) of 87.5% on the SDNET2018 dataset and 84.7% on the Kaggle - Crack Detection Challenge dataset, outperforming existing models. Overall, CrackDet-ViT demonstrates excellent performance and robustness, making it suitable for bridge crack detection in complex environments.

Published

2026-02-10

Issue

Section

Articles

How to Cite

Bridge Crack Detection and Classification Using CrackDet-ViT: A Vision Transformer and CNN-Based Segmentation Framework. (2026). Journal of Intelligence Technology and Innovation, 4(1), 69-87. https://itip-submit.com/index.php/JITI/article/view/230