Table Detection with Active Learning
By: Somraj Gautam, Nachiketa Purohit, Gaurav Harit
Potential Business Impact:
Teaches computers to learn from fewer examples.
Efficient data annotation remains a critical challenge in machine learning, particularly for object detection tasks requiring extensive labeled data. Active learning (AL) has emerged as a promising solution to minimize annotation costs by selecting the most informative samples. While traditional AL approaches primarily rely on uncertainty-based selection, recent advances suggest that incorporating diversity-based strategies can enhance sampling efficiency in object detection tasks. Our approach ensures the selection of representative examples that improve model generalization. We evaluate our method on two benchmark datasets (TableBank-LaTeX, TableBank-Word) using state-of-the-art table detection architectures, CascadeTabNet and YOLOv9. Our results demonstrate that AL-based example selection significantly outperforms random sampling, reducing annotation effort given a limited budget while maintaining comparable performance to fully supervised models. Our method achieves higher mAP scores within the same annotation budget.
Similar Papers
Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement
Machine Learning (CS)
Teaches computers to learn with less examples.
Box-Level Class-Balanced Sampling for Active Object Detection
CV and Pattern Recognition
Teaches computers to find objects better with less work.
Active Learning with a Noisy Annotator
Machine Learning (CS)
Finds good examples to teach computers, even with mistakes.