Vision Transformer for 15 Variations of Koi Fish Identification
DOI:
https://doi.org/10.37859/coscitech.v5i1.6711
Abstract
This research aims to classify various types of koi fish using Vision Transformer (ViT). There is previous research [1] using Support Vector Machine (SVM) as a classifier to identify 15 types of koi fish with training and testing datasets respectively of 1200 and 300 images. This research was continued by research [2] which implemented a Convolutional Neural Network (CNN) as a classifier to identify 15 types of koi fish with the same amount dataset. As a result, the research achieved a classification accuracy rate of 84%. Although the accuracy obtained from using CNN is quite high, there is still room for improvement in classification accuracy. Overcoming obstacles such as limitations in classification accuracy in previous studies and further exploration of the use of new algorithms and techniques, this study proposes a ViT architecture to improve accuracy in Koi fish classification. ViT is a deep learning algorithm adopted from the Transformer algorithm which works by relying on self-attention mechanism tasks. Because the power of data representation is better than other deep learning algorithms including CNN, researchers have applied this Transformer task in the field of computer vision, one of the results of this application is ViT. This study was designed using class and number datasets retained from two previous studies. Meanwhile, the koi fish image dataset used in this research was collected from the internet and has been validated. The implementation of ViT as a classifier in koi classification in this research resulted in an accuracy level that reached an average of 89% in all classes of test data.
Downloads
References
[2] M. S. Cueto, J. M. B. Diangkinay, K. W. B. Melencion, T. P. Senerado, H. L. P. Taytay, and E. R. E. Tolentino, “Classification of different types of koi fish using convolutional neural network,” Proc. - 5th Int. Conf. Intell. Comput. Control Syst. ICICCS 2021, no. Iciccs, pp. 1135–1142, 2021, doi: 10.1109/ICICCS51141.2021.9432358.
[3] A. A. Khan et al., “An Efficient Text-Independent Speaker Identification Using Feature Fusion and Transformer Model,” Comput. Mater. Contin., vol. 75, no. 2, pp. 4085–4100, 2023, doi: 10.32604/cmc.2023.036797.
[4] K. Han et al., “A survey on vision Transformer,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 87–110, 2022.
[5] J. A. Figo, N. Yudistira, and A. W. Widodo, “Deteksi Covid-19 dari Citra X-ray menggunakan Vision Transformer,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. e-ISSN, vol. 2548, p. 964X, 2020.
[6] X. Chen, C.-J. Hsieh, and B. Gong, “When vision Transformers outperform resnets without pre-training or strong data augmentations,” arXiv Prepr. arXiv2106.01548, 2021.
[7] C. F. Chen, Q. Fan, and R. Panda, “CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,” Proc. IEEE Int. Conf. Comput. Vis., pp. 347–356, 2021, doi: 10.1109/ICCV48922.2021.00041.
[8] M. Usman, T. Zia, and A. Tariq, “Analyzing transfer learning of vision Transformers for interpreting chest radiography,” J. Digit. Imaging, vol. 35, no. 6, pp. 1445–1462, 2022.
[9] Y. Wu, S. Qi, Y. Sun, S. Xia, Y. Yao, and W. Qian, “A vision Transformer for emphysema classification using CT images,” Phys. Med. Biol., vol. 66, no. 24, p. 245016, 2021.
[10] G. G. Tahyudin, E. Rachmawati, and M. D. Sulistiyo, “Klasifikasi Gender Berdasarkan Citra Wajah Menggunakan Vision Transformer,” eProceedings Eng., vol. 10, no. 2, 2023.
[11] R. Ghali, M. A. Akhloufi, M. Jmal, W. Souidene Mseddi, and R. Attia, “Wildfire segmentation using deep vision Transformers,” Remote Sens., vol. 13, no. 17, p. 3527, 2021.
[12] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv Prepr. arXiv2010.11929, 2020.
[13] R. Firdaus, J. Satria, and B. Baidarus, “Klasifikasi Jenis Kelamin Berdasarkan Gambar Mata Menggunakan Algoritma Convolutional Neural Network (CNN),” J. CoSciTech (Computer Sci. Inf. Technol., vol. 3, no. 3, pp. 267–273, 2022.
[14] H. Mukhtar, E. Aryanto, and Y. S. Sy, “Deep Learning untuk mendeteksi gangguan lambung melalui citra iris mata,” J. CoSciTech (Computer Sci. Inf. Technol., vol. 4, no. 3, pp. 580–589, 2023.










