Bangladocatlas: A multi-class annotated dataset for complex bangla document layout analysis
Hossain, Md Safayat ; Ferdous, Jannatul ; Uddin, Md Raihan ; Hossain K.M.A. ; Ahmed M.I. ; Rahman M.A. ; Sushmit, Asif Shahriyar ; Sadeque, Farig ; Shatabda, Swakkhar ; Asaduzzaman, Abu
Hossain, Md Safayat
Ferdous, Jannatul
Uddin, Md Raihan
Hossain K.M.A.
Ahmed M.I.
Rahman M.A.
Sushmit, Asif Shahriyar
Sadeque, Farig
Shatabda, Swakkhar
Asaduzzaman, Abu
Citations
Altmetric:
Other Names
Location
Time Period
Advisors
Original Date
Digitization Date
Issue Date
2025-09-15
Type
Conference paper
Genre
Keywords
Edge-cloud,Heterogeneous systems,Execution time,Energy consumption,Throughput,ML models
Subjects (LCSH)
Citation
M. S. Hossain et al., "BanglaDocAtlas: A Multi-Class Annotated Dataset for Complex Bangla Document Layout Analysis," 2025 IEEE High Performance Extreme Computing Conference (HPEC), Wakefield, MA, USA, 2025, pp. 1-7, doi: 10.1109/HPEC67600.2025.11196300.
Abstract
Optical Character Recognition (OCR) technology is a vital tool for digitizing printed content, enabling efficient data extraction and enhancing document accessibility. Traditional OCR techniques rely on pre-stored templates for fonts or structured documents. Recent advancements in Machine Learning (ML), particularly Convolutional Neural Network (CNN) and transformer-based architectures, have enhanced OCR technologies with human-like intelligence. However, these models often fall short due to limitations in the diversity of document types, layouts, and content in the training datasets, particularly for complex Bangla documents. In this paper, we address the challenge of a limited, diverse dataset by introducing BanglaDocAtlas, a versatile and multi-class annotated dataset specifically designed to advance Bangla document layout analysis. The dataset includes eight distinct classes: paragraph, text, image, title, caption, table, advertisement, and page number, enabling comprehensive OCR applications. State-of-the-art segmentation models, i.e., You Only Look Once (YOLO), and a detection model, e.g., Real-Time DEtection TRansformer (RT-DETR), are trained and evaluated on the BanglaDocAtlas dataset. The results demonstrate that YOLOv9 achieves the highest precision, with values of 0.87 for bounding boxes and 0.79 for masks, while RT-DETR outperforms in recall, with a value of 0.86 for bounding boxes.
Table of Contents
Description
Click on the DOI link to access this conference paper at the publishers website (may not be free).
Publisher
IEEE
Journal
Book Title
Series
2025 IEEE High Performance Extreme Computing Conference (HPEC)
Digital Collection
Finding Aid URL
Use and Reproduction
Archival Collection
PubMed ID
ISSN
2643-1971
2377-6943
2377-6943
