business objective
- The client was having difficulty collecting financial data in tabular format from PDF documents/images from various sources such as financial reports and regulatory filings as the manual extraction process was time consuming and prone to errors. They approached Cognition to develop a solution that automatically extracts table information from these documents and uploads it to the client’s database without error.
- The client wants to extract the financial data from PDF documents / images for over 1,000 organizations every month and export it to their database.
Our solution
Cognition developed a technology-driven solution to efficiently extract financial data from tables in PDF / image documents and upload it into the client’s database.
- The Cognition team utilized PDF parsing tools and optical character recognition (OCR) techniques based on AI and machine learning to extract maximum precision data from tables present in the given companies’ documents in various formats such as scanned images or PDF documents.
- The team then carried out the cleansing and data validation processes to ensure that the extracted data was accurate, comprehensive, and consistent. This includes the following:
- Verifying the data collected by the tool for various financial metrics.
- Addressing missing data and correcting inaccuracies or errors in the extracted data.
- Identifying and deleting duplicate records to reduce redundancy.
- Ensuring data format consistency for dates, numbers and text fields throughout all extracted data utilizing AI tools.
- The Cognition team eventually collaborated with the client to create a script that can interact with the client’s database API and automatically upload the finalized data to the database.
Outcome
- Efficiency gain of more than 60% when compared to manual data extraction
- Turnaround time reduced from a month to ~12 business days