Outsourced Document Processing: Top 10 Things to Learn

Outsourced Document Processing is the transfer of responsibility for an organization’s document-intensive functions to a third party or specialist. Document processing is the conversion of paper-based or electronic documents containing handwritten, typed or scanned text and images into a digital format. 

The digitized images and captured data are typically uploaded into an Electronic Document Management System (EDMS). The provider may utilize the services of experienced data entry professionals and various document processing tools for conversion, such as Intelligent Character Recognition (ICR) and Optical Character Recognition (OCR).

Document-intensive functions are at the heart of most business processes, from procurement to accounting to corporate strategy. Common document processing services that are outsourced include document preparation, document scanning, document indexing and linking, data capture, data encryption, records conversion and retention, records management, document process or workflow engineering, help desk, disaster recovery, integration with legacy systems, and cloud-based secure document access.

An InfoTrends study reported that companies outsource document processing mainly to reduce cost, improve process efficiency, drive innovation, and add value. Service providers have the technology, domain expertise, and deep understanding of customer needs to provide a high level of efficiency at a lower cost.

  1. Market Size and Forecast
  2. According to the Global Document Outsourcing Market 2014-2018 study by TechNavio covering the Americas, APAC, and EMEA regions, the global document outsourcing market will grow at a compound annual growth rate (CAGR) of 5.25 percent from 2013 to 2018. The increasing number of companies looking for end-to-end document process outsourcing is driving growth. However, TechNavio reported that while offerings from service providers are expanding, lack of flexibility with service level agreements (SLAs) may challenge the growth. 

  3. Evolution of Outsource Document Processing Industry
  4. Canon Group and InfoTrends charted the history of outsourced document processing in its 2011 white paper The Evolution of Document Process Outsourcing. In the early 1990s when BPO contracts were in its infancy, document outsourcing providers saw opportunities to offer document-intensive services as standalone agreements or in collaboration with major BPO providers. InfoTrends predicted that the United States document outsourcing market will reach almost $1 billion in value by 2014. Meanwhile, Western Europe’s document outsourcing market was expected to reach €2.7 billion in 2014.

    Today, outsourced document processing continues to grow, driven by demand from companies looking for end-to-end document processing solutions. DPO is increasingly seen as more than just a cost-reduction tool, but a strategic solution that improves business process efficiency and drives value. DPO continues to evolve within the segment, with a growing number of service providers expanding their offerings beyond traditional document processing services.

    From traditional document processing where individual services or tasks like scanning and printing are undertaken by a third party, DPO service providers today perform all steps within the outsourced process (end-to-end DPO). InfoTrends now defines DPO as the outsourcing of a document-intensive process with multiple tasks to support a single business function. For example, an organization hires a DPO provider to handle end-to-end document processing for its entire Finance and Accounting (F&A) division.

    According on InfoTrends, a document process has a specific input and output, and the process consists of a well-defined sequence of activities and a well-defined output recipient. The specific business process can also be named. For example, the input is sales leads, and the output is physical and electronic promotional materials. The activities may include web and graphic design, data management, distribution, and tracking. The recipient in this case is the client, and the business process is new customer acquisition.

  5. Mergers and Third Platform Impact
  6. Industry mergers and the so-called “3rd platform” (four key technologies that include mobility, big data, social business and cloud) are driving transformations in all BPO segments, including DPO. A few years ago, major BPO providers partnered with DPO specialists to provide standalone services. Today, BPO and DPO services are becoming closely integrated due to customer demand for comprehensive services.

    Bundled services allow companies to reduce the cost of managing multiple vendors, better utilize vendor resources like multi-location outsourcing, and build closer, more collaborative relationships. This in turn leads to better value and greater innovation. DPO providers are leveraging cloud services, social media, mobile devices and big data to help companies become fully digital, market-facing businesses.

  7. Outsourced Document Processing Trends
  8. Outsourced document processing in the 21st century is more than just a cost reduction activity. InfoTrends reports that companies are increasingly turning to DPO to drive innovation and process improvement. According to a study conducted by Forbes Insights, more companies are using metrics that evaluate more than just than bottom line savings. While it is true that cost reduction remains a number one concern, companies are also looking for other ways to improve workflows and drive long-term value.

    Value-added benefits like higher productivity, improved customer service and new revenue streams are some examples of what companies are looking for when they partner with third party DPO specialists. In turn, DPO providers are expanding their offerings to meet demand. Here’s how a DPO vendor helped a rental car company cut costs and improve efficiencies by handling the reorganization and management of the client’s internal document process. 

    The DPO vendor took over end-to-end document processing, from document origination to digital conversion of vehicle contracts and maintenance records. They also handled document scanning and process management and control. As a result of outsourcing document processing, the client saved $700,000 annually on rental contract imaging and indexing and reduced their turnaround time by 12-24 hours.

  9. Intelligent Character Recognition (ICR)
  10. Intelligent Character Recognition (ICR) refers to technologies that recognize and analyze handwritten characters from scanned images. It is a more specific type of optical character recognition (OCR) that is ideal for digitizing documents containing different styles of handwriting. In document processing, ICR software is used to identify any character contained in a digitized image. The ICR software returns this information in a way that is recognizable to the machine and end user. 


    The document is scanned and saved as a high-resolution digital image (usually TIFF) and fed to the computer. The ICR software analyzes the image and translates the data into machine-readable characters. Modern ICR software consists of a neural network (self-learning) system that automatically updates the recognition database. In this sense, the software’s level of intelligence can be programmed. Accuracy levels vary when digitizing hand-printed text with ICR software. To achieve high recognition rates and accuracy, multiple engines are often used. 

    According to Top Image Systems’ Principles of Character Recognition, ICR software may use semantic, statistical, or hybrid character recognition.


    The statistical approach involves looking for spatial distribution patterns of pixel values in a scanned image of a handwritten character or digit. For example, looking at the handwritten digit “1” and “8” the ICR software would identify the ratio of black pixels to white pixels in each image (as represented by histograms of the two digits) and differentiate between them.


    Scanned images of handwritten characters form lines, contours, and spaces. The semantic approach to character recognition identifies the contours and lines formed by pixels of scanned images and looks for patterns or relationships for each character. For example, there are various but not unlimited ways to write the letter “a.” All the possible depictions can be compiled in a database and used by the software for analysis and comparison. However, the semantic approach may fail to recognize a character when the scanned image of the character is broken and the contour cannot be traced properly.


    Hybrid intelligent character recognition is a combination of statistic and semantic approaches, designed to overcome the limitations of each method. Character recognition engines perform best with a specific image or document; ICR software can thus incorporate all of these engines to create the best possible result. When digitizing handwritten numbers for example, ICR engines designed to read numbers have higher “voting” rights. When digitizing handwritten text, ICR engines designed to read letters take preference.

  11. ICR in Forms Processing
  12. Form processing used to be performed by data entry workers who read the documents and manually keyed in data into a computer. Automated Forms Processing was born in 1993 and involved a three-step process: capturing the image of the original document and preparing the image for ICR, capturing the information, and processing the results for automatic validation. Today, companies use scanners and ICR/OCR form processing software to automate this process and achieve about 98 percent accuracy, similar to the accuracy achieved during manual data entry.

    Forms processing software often uses a combination of ICR, optical character recognition (OCR), and optical mark recognition (OMR) systems to digitize handwritten text, machine-printed type, marks on check boxes, bar codes, and signatures from scanned images. ICR technology automates data entry functions when digitizing hand-filled forms, surveys and applications. The software interface may include scanning, recognition, verification, and management tools for processing large volumes of data.

    Organizations that collect data on paper-based forms may use forms processing software to automate data entry. Data entry professionals may then validate and proofread the data to improve accuracy. Data entry automation with ICR software is recommended for companies that handle 100 or more forms per month.

  13. Optical Character Recognition (OCR)
  14. Optical character recognition (OCR) is the process of converting scanned text images into a form that machines can easily recognize, edit, search, store and display. Optical character recognition systems are commonly used in document processing to transform data from paper and digital records into machine-encoded text. Besides reducing their paper trail, businesses use OCR to reduce data entry errors, consolidate data entry, create human readable text, and encode large volumes of data.

    The OCR input (original document) can come from several sources, including handwritten text (prescriptions, snail mail correspondence), printed material (passport, invoices, business cards), and scanned images with printed text and handwriting. The output is stored and delivered in a specific file format, such as PDF. The PDF file can be further optimized for the web to ensure fast download and access.


    The Internet has changed the way users access printed material and library resources. Many prefer documents in electronic format that can be easily accessed with an internet connection. Digitized text is also easier to use in document processing activities like text-to-speech, data and text mining, and machine translation.

    For businesses that have massive legacies of paper records, optical character recognition was a strategy towards a better organized, paperless office. In its infancy, OCR involved the use of high-speed scanners and advanced OCR software that automatically converted thousands of pages into user-friendly electronic text.

    According to the article Optical Character Recognition published by MacUser magazine in August 2012, companies in the late 1980s that had invested in expensive scanning equipment and OCR achieved only 98 percent accuracy, meaning that there were 10 or more errors on an average page. Today, many OCR software makers claim 99 percent accuracy, but only when used on good-quality, clean images like Microsoft Word documents and not on historical newspapers and material.

  15. OCR Process
  16. Digitizing printed matter typically begins with the scanning of original documents. The data is then saved as various digital formats and made accessible online. Data entry professionals and archivists work with two types of print documents (text-based and graphic based) and use different techniques to digitize them. Graphic-Based Materials Graphic-based printed materials include drawings, manuscripts, slides, posters, historical photos and documents with illustrations or images. Many old and historical documents (newspapers, magazines, books) are considered graphic-based because they often have unusual fonts, stains, and colored backgrounds. The documents are scanned in color at very high resolution to create reproductions that are as faithful to the original as possible. The reproductions are then saved in image formats that suit the method of presentation. For example, archival images may be saved as TIFF (Tagged Image File Format) master files and converted into JPEG (Joint Photographic Experts Group) access files for online viewing. Text-Based Materials Text-based materials include journal articles, reports, meeting minutes, dissertations, research papers, and modern books, magazines, and newspapers. Text-based materials are scanned and converted into machine-encoded text.

  17. Optical Character Recognition Software
  18. OCR software works with a bitmapped image of the document to separate each character or glyph. The software analyzes each glyph and matches it to one that is in the character set of a recognition language. After the blocks of characters are separated into words, each word is checked against a dictionary. The word that fits best is the output. When there is no good word-character match to be found, the software returns recognized characters and marks. Modern OCR engines use multiple algorithms and average the result to obtain a single reading.

    Unlike humans, however, OCR software has yet to assemble words into the context of sentences or paragraphs, which increase the chances of errors. Recognition performance is also affected by the quality of the scanned image and the type of software used. OCR works best with high-quality, black and white images without blurs or smudges and with a dedicated software package. To further increase accuracy, human proofreading is a must.

    OCR Core Algorithms

    To recognize characters, OCR uses two basic algorithms: matrix matching and feature extraction. Matrix matching involves comparing what the OCR scanner “sees” as a character to a library of character templates. When a match is found, the software names the image with a corresponding ASCII character. Matrix matching is recommended for documents with a limited set of type styles and with little or no variation within each style. 

    Feature extraction (also called intelligent character recognition or topological feature analysis) is a more advanced process. The software looks for features like closed shapes, stroke edge, the background color, diagonal lines, open areas, etc. and compares these features with an abstract representation of that character. Feature extraction works best when the characters are less predictable.

  19. Data Entry
  20. Data entry is the process of entering text, numbers or facts into an electronic spreadsheet or database. Data can be keyed in manually into a computer by an individual (manual data entry) or by a machine entering data electronically (automated data entry). Data entry belongs under the broader umbrella of data processing or information processing. Data processing is the gathering and manipulation of data, while information processing refers to stages or changes that information undergoes. Data processing functions include capture, entry, validation, sorting, summarization, aggregation, analysis, reporting, and classification. In business, the term data entry is often used in the context of forms processing and commercial data processing.


    Humans have been processing data for centuries. The history of electronic data entry can be traced back to the 1700s when punched cards were first used to control machinery and record and process data. From the 1900s to the 1950s, most organizations used punched cards for data entry, storage and processing, thus increasing demand for workers to run keypunch machines. In the 1970s, typewriters and keypunch machines were gradually replaced by computers with video display terminals or screens that allowed the typist to see the data before it was printed. 

    Today, data entry is performed by data entry clerks or typists with the help of computers, scanners and forms processing software. Data usually comes from paper documents or scanned images, which are transferred into the database using a keyboard, recorder or scanner. Modern organizations accumulate vast amounts of data, including operational/transactional data (cost, sales, payroll), non-operational data (forecast, industry figures) and meta data (information about the data). 

    Manual Data Entry

    Manual data entry is performed by humans entering data from a paper document or scanned image into a physical database (such as a record/tally sheet) or into an electronic database. According to Teresia R. Ostrach’s Typing Speed: How Fast is Average?, the average typing speed of data entry professionals is between 50 to 80 words per minute, which is relatively sluggish compared to the speed of automated scanning equipment. Other issues with manual data entry are the high labor and overhead costs associated with employing data entry professionals and the possibility of typographical errors.

    Automated Data Entry

    Automated data entry is performed by human operators but with much of the work done by machines or computers. Computers feature customizable interfaces, built-in templates that map the document, and different character recognition software that analyze scanned images of paper documents. Optical character recognition (OCR) is used to read machine printed characters, while intelligent character recognition is used for handwritten characters. Check boxes, bar codes and magnetic ink are read using optical mark recognition (OMR), bar code recognition (BCR) and magnetic ink character recognition (MICR) software, respectively. 

    Data Entry Professionals

    data entry professional records, updates, maintains, and retrieves data held in computer systems. Data entry workers may use special keyboards to speed up the work and reduce the risk of repetitive strain injury. Companies typically require data entry workers to be proficient in touch typing and have basic knowledge of databases, word processing software, and spreadsheets. The job description vary depending on the industry, but the general nature of work requires little or no technical knowledge. Organizations may choose to hire data entry professionals in-house or outsource data entry to a third party.

    Data entry tasks can be integrated with customer service tasks. These include entering customer information into a database during new account openings, lead generation, and new customer acquisition. A data entry/customer service professional may also be assigned to update electronic medical records, process invoices and confirm or verify client records through phone, SMS, chat or email.



Talk to an expert today. Contact Sourcefit to learn more about the possibilities and opportunities of building your dedicated team of professionals in the Philippines.