Outsourced Document Processing: Top 10 Things to Learn

Outsourced Document Processing involves transferring an organization’s document-intensive functions to a third party or specialist. This process encompasses converting paper-based or electronic documents containing handwritten, typed, or scanned text and images into a digital format. The resulting digitized images and captured data are typically uploaded into an Electronic Document Management System (EDMS). Service providers often utilize experienced data entry professionals and various document processing tools, such as Intelligent Character Recognition (ICR) and Optical Character Recognition (OCR), for efficient conversion.

Document-intensive functions play a vital role in various business processes, ranging from procurement to accounting and corporate strategy.  Common document processing services that are outsourced include document preparation, document scanning, document indexing and linking, data capture, data encryption, records conversion and retention, records management, document process or workflow engineering, help desk, disaster recovery, integration with legacy systems, and cloud-based secure document access.

A study by InfoTrends highlighted that companies opt for outsourcing document processing primarily to reduce costs, enhance process efficiency, drive innovation, and add value. By leveraging the technology, domain expertise, and customer-centric approach of service providers, organizations can achieve a higher level of efficiency at a lower cost.

1. Market Size and Forecast

The global document outsourcing market exhibited a significant value of $US 5.25 Bn in 2022, and it is projected to experience substantial growth with a compound annual growth rate (CAGR) of 6.3 percent. By 2030, the market is estimated to reach an impressive $11.74 Bn.

2. Evolution of Outsource Document Processing Industry

Canon Group and InfoTrends charted the history of outsourced document processing in its white paper The Evolution of Document Process Outsourcing. In the early 1990s when BPO contracts were in its infancy, document outsourcing providers saw opportunities to offer document-intensive services as standalone agreements or in collaboration with major BPO providers.

From traditional document processing where individual services or tasks like scanning and printing are undertaken by a third party, DPO service providers today perform all steps within the outsourced process (end-to-end DPO). InfoTrends defines DPO as the outsourcing of a document-intensive process with multiple tasks to support a single business function. For example, an organization hires a DPO provider to handle end-to-end document processing for its entire Finance and Accounting (F&A) division.

A document process entails a clear input and output, comprising a well-defined sequence of activities with a specific recipient for the final output. This particular business process can also be given a distinct name. For instance, the input may be sales leads, and the output may encompass both physical and electronic promotional materials. The activities involved could encompass web and graphic design, data management, distribution, and tracking. In this scenario, the client serves as the recipient, and the business process is termed “new customer acquisition.”

3. Mergers and Third Platform Impact

Industry mergers and the revolutionary “3rd platform,” encompassing key technologies like mobility, big data, social business, and cloud computing, are catalyzing transformative changes across all Business Process Outsourcing (BPO) segments, including Document Process Outsourcing (DPO). In the past, major BPO providers collaborated with DPO specialists to deliver standalone services. However, in response to customer demands for comprehensive solutions, BPO and DPO services are now seamlessly integrated.

The integration of bundled services offers companies several advantages, such as cost reduction in managing multiple vendors, optimized utilization of vendor resources through multi-location outsourcing, and the cultivation of closer and more collaborative relationships. This, in turn, fosters enhanced value and increased innovation. DPO providers are capitalizing on cloud services, social media platforms, mobile devices, and big data analytics to aid companies in their journey towards becoming fully digital and market-facing entities. These technologies serve as catalysts for driving transformation and ushering businesses into the digital age, where they can thrive with increased efficiency and engagement with their target markets.

4. Outsourced Document Processing Trends

Outsourced document processing in the 21st century is more than just a cost reduction activity. InfoTrends reports that companies are increasingly turning to DPO to drive innovation and process improvement. According to a study conducted by Forbes Insights, more companies are using metrics that evaluate more than just bottom line savings. While it is true that cost reduction remains a number one concern, companies are also looking for other ways to improve workflows and drive long-term value.

Value-added benefits like higher productivity, improved customer service and new revenue streams are some examples of what companies are looking for when they partner with third party DPO specialists. In turn, DPO providers are expanding their offerings to meet demand. Here’s how a DPO vendor helped a rental car company cut costs and improve efficiencies by handling the reorganization and management of the client’s internal document process. 

The DPO vendor took over end-to-end document processing, from document origination to digital conversion of vehicle contracts and maintenance records. They also handled document scanning and process management and control. As a result of outsourcing document processing, the client saved $700,000 annually on rental contract imaging and indexing and reduced their turnaround time by 12-24 hours.

5. Intelligent Character Recognition (ICR)

Intelligent Character Recognition (ICR) refers to technologies that recognize and analyze handwritten characters from scanned images. It is a more specific type of optical character recognition (OCR) that is ideal for digitizing documents containing different styles of handwriting. In document processing, ICR software is used to identify any character contained in a digitized image. The ICR software returns this information in a way that is recognizable to the machine and end user.


The document is scanned and saved as a high-resolution digital image (usually TIFF) and fed to the computer. The ICR software analyzes the image and translates the data into machine-readable characters. Modern ICR software consists of a neural network (self-learning) system that automatically updates the recognition database. In this sense, the software’s level of intelligence can be programmed. Accuracy levels vary when digitizing hand-printed text with ICR software. To achieve high recognition rates and accuracy, multiple engines are often used.

According to Top Image Systems’ Principles of Character Recognition, ICR software may use semantic, statistical, or hybrid character recognition.


The statistical approach involves looking for spatial distribution patterns of pixel values in a scanned image of a handwritten character or digit. For example, looking at the handwritten digit “1” and “8” the ICR software would identify the ratio of black pixels to white pixels in each image (as represented by histograms of the two digits) and differentiate between them.


Scanned images of handwritten characters form lines, contours, and spaces. The semantic approach to character recognition identifies the contours and lines formed by pixels of scanned images and looks for patterns or relationships for each character. For example, there are various but not unlimited ways to write the letter “a.” All the possible depictions can be compiled in a database and used by the software for analysis and comparison. However, the semantic approach may fail to recognize a character when the scanned image of the character is broken and the contour cannot be traced properly.


Hybrid intelligent character recognition is a combination of statistical and semantic approaches, designed to overcome the limitations of each method. Character recognition engines perform best with a specific image or document; ICR software can thus incorporate all of these engines to create the best possible result. When digitizing handwritten numbers for example, ICR engines designed to read numbers have higher “voting” rights. When digitizing handwritten text, ICR engines designed to read letters take preference.

6. ICR in Forms Processing

Form processing used to be performed by data entry workers who read the documents and manually keyed in data into a computer. Automated Forms Processing was born in 1993 and involved a three-step process: capturing the image of the original document and preparing the image for ICR, capturing the information, and processing the results for automatic validation. Today, companies use scanners and ICR/OCR form processing software to automate this process and achieve about 98 percent accuracy, similar to the accuracy achieved during manual data entry.

Forms processing software often uses a combination of ICR, optical character recognition (OCR), and optical mark recognition (OMR) systems to digitize handwritten text, machine-printed type, marks on check boxes, bar codes, and signatures from scanned images. ICR technology automates data entry functions when digitizing hand-filled forms, surveys and applications. The software interface may include scanning, recognition, verification, and management tools for processing large volumes of data.

Organizations that collect data on paper-based forms may use forms processing software to automate data entry. Data entry professionals may then validate and proofread the data to improve accuracy. Data entry automation with ICR software is recommended for companies that handle 100 or more forms per month.

7. Optical Character Recognition (OCR)

Optical character recognition (OCR) is the process of converting scanned text images into a form that machines can easily recognize, edit, search, store and display. Optical character recognition systems are commonly used in document processing to transform data from paper and digital records into machine-encoded text. Besides reducing their paper trail, businesses use OCR to reduce data entry errors, consolidate data entry, create human readable text, and encode large volumes of data.

The OCR input (original document) can come from several sources, including handwritten text (prescriptions, snail mail correspondence), printed material (passport, invoices, business cards), and scanned images with printed text and handwriting. The output is stored and delivered in a specific file format, such as PDF. The PDF file can be further optimized for the web to ensure fast download and access.


The Internet has changed the way users access printed material and library resources. Many prefer documents in electronic format that can be easily accessed with an internet connection. Digitized text is also easier to use in document processing activities like text-to-speech, data and text mining, and machine translation.

For businesses that have massive legacies of paper records, optical character recognition was a strategy towards a better organized, paperless office. In its infancy, OCR involved the use of high-speed scanners and advanced OCR software that automatically converted thousands of pages into user-friendly electronic text.

Companies in the late 1980s that invested in expensive scanning equipment and OCR achieved only 98 percent accuracy, meaning that there were 10 or more errors on an average page. Today, many OCR software makers claim 99 percent accuracy, but only when used on good-quality, clean images like Microsoft Word documents and not on historical newspapers and material.

However, it is important to note that OCR accuracy can still vary based on the quality and complexity of the source material. OCR performance on historical newspapers and material, which might have deteriorated or contain different fonts and layouts, can be more challenging. As a result, achieving the same high accuracy levels on such documents may be more difficult compared to clean and well-formatted texts.

OCR technology continues to evolve, and the accuracy rates may continue to improve over time with advancements in machine learning and artificial intelligence. 

8. OCR Process

Digitizing printed matter typically begins with the scanning of original documents. The data is then saved as various digital formats and made accessible online. Data entry professionals and archivists work with two types of print documents (text-based and graphic based) and use different techniques to digitize them. 

Graphic-Based Materials 

Graphic-based printed materials include drawings, manuscripts, slides, posters, historical photos and documents with illustrations or images. Many old and historical documents (newspapers, magazines, books) are considered graphic-based because they often have unusual fonts, stains, and colored backgrounds. The documents are scanned in color at very high resolution to create reproductions that are as faithful to the original as possible. The reproductions are then saved in image formats that suit the method of presentation. For example, archival images may be saved as TIFF (Tagged Image File Format) master files and converted into JPEG (Joint Photographic Experts Group) access files for online viewing. 

Text-Based Materials 

Text-based materials include journal articles, reports, meeting minutes, dissertations, research papers, and modern books, magazines, and newspapers. Text-based materials are scanned and converted into machine-encoded text.

9. Optical Character Recognition Software

OCR software works with a bitmapped image of the document to separate each character or glyph. The software analyzes each glyph and matches it to one that is in the character set of a recognition language. After the blocks of characters are separated into words, each word is checked against a dictionary. The word that fits best is the output. When there is no good word-character match to be found, the software returns recognized characters and marks. Modern OCR engines use multiple algorithms and average the result to obtain a single reading.

Unlike humans, however, OCR software has yet to assemble words into the context of sentences or paragraphs, which increases the chances of errors. Recognition performance is also affected by the quality of the scanned image and the type of software used. OCR works best with high-quality, black and white images without blurs or smudges and with a dedicated software package. To further increase accuracy, human proofreading is a must.

OCR Core Algorithms

To recognize characters, OCR uses two basic algorithms: matrix matching and feature extraction. Matrix matching involves comparing what the OCR scanner “sees” as a character to a library of character templates. When a match is found, the software names the image with a corresponding ASCII character. Matrix matching is recommended for documents with a limited set of type styles and with little or no variation within each style.

Feature extraction (also called intelligent character recognition or topological feature analysis) is a more advanced process. The software looks for features like closed shapes, stroke edge, the background color, diagonal lines, open areas, etc. and compares these features with an abstract representation of that character. Feature extraction works best when the characters are less predictable.

10. Data Entry

Data entry is the process of inputting text, numbers, or facts into an electronic spreadsheet or database. It can be carried out either manually by an individual (manual data entry) or electronically by a machine (automated data entry). Data entry falls within the broader realm of data processing or information processing. Data processing involves the collection and manipulation of data, whereas information processing refers to the various stages or transformations that information undergoes. Data processing functions encompass tasks such as data capture, entry, validation, sorting, summarization, aggregation, analysis, reporting, and classification. In business, the term “data entry” is frequently used in the context of forms processing and commercial data processing.


Humans have been processing data for centuries. The history of electronic data entry can be traced back to the 1700s when punched cards were first used to control machinery and record and process data. From the 1900s to the 1950s, most organizations used punched cards for data entry, storage and processing, thus increasing demand for workers to run keypunch machines. In the 1970s, typewriters and keypunch machines were gradually replaced by computers with video display terminals or screens that allowed the typist to see the data before it was printed.

Today, data entry is performed by data entry clerks or typists with the help of computers, scanners and forms processing software. Data usually comes from paper documents or scanned images, which are transferred into the database using a keyboard, recorder or scanner. Modern organizations accumulate vast amounts of data, including operational/transactional data (cost, sales, payroll), non-operational data (forecast, industry figures) and metadata (information about the data).

Manual Data Entry

Manual data entry is performed by humans entering data from a paper document or scanned image into a physical database (such as a record/tally sheet) or into an electronic database. The average typing speed of data entry professionals is between 50 to 80 words per minute, which is relatively sluggish compared to the speed of automated scanning equipment. Other issues with manual data entry are the high labor and overhead costs associated with employing data entry professionals and the possibility of typographical errors.

Automated Data Entry

Automated data entry is a collaborative process between human operators and machines or computers, leveraging technology to streamline the workload. Computers offer customizable interfaces and built-in templates that efficiently map the document, while employing various character recognition software to analyze scanned images of paper documents. Optical character recognition (OCR) is harnessed to read machine-printed characters, while intelligent character recognition is employed for interpreting handwritten characters. In addition, check boxes, bar codes, and magnetic ink are read using optical mark recognition (OMR), bar code recognition (BCR), and magnetic ink character recognition (MICR) software, respectively. This synergistic approach between human operators and advanced technology enhances data entry accuracy and efficiency.

Data Entry Professionals

A data entry professional records, updates, maintains, and retrieves data held in computer systems. Data entry workers may use special keyboards to speed up the work and reduce the risk of repetitive strain injury. Companies typically require data entry workers to be proficient in touch typing and have basic knowledge of databases, word processing software, and spreadsheets. The job description varies depending on the industry, but the general nature of work requires little or no technical knowledge. Organizations may choose to hire data entry professionals in-house or outsource data entry to a third party.

To learn more about outsourcing, you can read our Top 8 Qualities of an Outsourcing Company in the Philippines article. We invite you to follow us on social media and to visit our website to learn more about our services.

About Us: Sourcefit is a widely recognized US-managed business process outsourcing company based in Manila, Philippines. We proudly serve over 100 clients with a workforce of more than 1,300 employees. Our global centers can serve multiple markets, and our staff is highly proficient in English, Spanish, French, and Portuguese. Whether you need a few or many employees, we can help you achieve your business goals and build high-quality offshore teams.

Latest Awards

Sourcefit’s commitment to excellence has been recognized through numerous industry awards and certifications.

We recently received the Fortress Cyber Security Award from the Business Intelligence Group. Sourcefit was also honored with the prestigious recognition of Best Outsourcing Solutions Provider in the Philippines during the 2023 Business Excellence Awards.

Other awards include: FT ranking of 500 high-growth Asia-Pacific, The Marketing Excellence Awards, Inquirer Growth Champion,  HR Asia Awards, among others.

Popular Posts

Not sure where to start? Connect with us and see how Sourcefit can help you grow your business.