no code implementations • 20 Feb 2021 • Siwen Luo, Mengting Wu, Yiwen Gong, Wanying Zhou, Josiah Poon
The main contributions of this paper are proposing the Financial Documents dataset with table-area annotations, the superior detection model and the rule-based layout segmentation technique for the tabular data extraction from PDF files.
Optical Character Recognition Optical Character Recognition (OCR) +1