Overview
Perimeter Group understood the RCDSO wished to complete a back-file imaging project for the conversion of Ontario Dentist Registration Applications and supporting documentation to searchable, digital PDF files for upload to a new SharePoint based DM solution. The archive consisted of hundreds of Member files which contained various volumes of documents within the file ranging from new Member files with 25 to 50 pages up to Member files with 250 – 500 pages or more depending on how old and active the file was.
For the initial project, RCDSO selected 1,000 newer Member files 2017 – 2019 for conversion, with an anticipated 80% of the files (800) containing only the Registration Application and supporting documents to the application within these files. The remaining 20% or 200 files contained PCRA documents and other documents which needed to be separated from the Registration file and uploaded to SharePoint into their own categories for security and access control within the DM solution. RDCSO conducted an internal review of these Member files in order to identify the various additional document sets.
Perimeter Group and RCDSO personnel held an orientation meeting in January 2020 where Perimeter personnel attended on-site to review space requirements, power availability, available table and chairs for indexing personnel, reviewed the files for Phase 1, bring the necessary banker’s boxes on-site and held discussions with the RCDSO team with respect to gaining a complete understanding of the project requirements. In February Perimeter personnel were on-site to begin the actual boxing and indexing of the Phase 1 files and then removed the completed boxes offsite.
Engagement and Process
The following overview was provided for completion of the project off-site at Perimeter’s facilities.
Perimeter reviewed the initial new members files 2017 – 2019 selected for the pilot project and completely understood the necessary processes required to deliver high quality, searchable digital data of the scanned files. The initial Member files selected contained either just the Registration Application and supporting documents – correspondence, certifications, confirmation from universities etc. or the Registration Application and supporting documents as well as PCRA documents, QA, Secret QA and others. For the Member files with only the Registration Application and supporting documents, the following was our methodology for completion of the pilot project.
File Review
RCDSO completed a review of the files to segregate the Member files that only had the Registration Application and support documents into one group called Phase 1. The Phase 1 group of files were the files that Perimeter dealt with during the initial pick up. The files that had additional documents in them such as the PCRA documents were reviewed by RCDSO personnel and grouped with coloured separation sheets placed in the file to denote where the PCRA or other documents were located within the file and where they started and ended. Perimeter dealt with these files together during a second pick up once the review had been completed. On-site work commenced to complete Packing and Indexing of the Phase 1 Member Files. Perimeter supplied the necessary banker’s boxes, which were be built on-site to hold the estimated 800 Phase 1 files – approximately 23 – 25 bankers’ boxes. Perimeter personnel with the supervision of the assigned RCDSO personnel removed the files from the shelves and place them in the banker’s boxes. The boxes were given a box ID and a tracking label affixed to each box which identified the Perimeter staff member that had been assigned the box for on-site indexing. This label also tracked who completed the document preparation and scanning supported by internal tracking sheets as the boxes moved through the prep and scanning process. The Perimeter staff member assigned to each box had custody of the box on-site and was responsible for indexing all the files contained in the box to Excel. The index included the box ID number, Unique Member Number and in the case of Phase 1 files, the description i.e.: Registration Application.
Once indexed on-site, a printed copy of the file list for each box was taped to the inside of the lid and initialed by the Perimeter staff member. RCDSO personnel reviewed the indexed list of files against the physical files in the box to ensure all files were accounted for and signed off on the tracking sheet in the lid. Perimeter personnel then sealed the boxes closed for transport to our secure facility.
Logistics
Perimeter provided a vehicle for transport of the packed boxes to our secure facility in February 2020. Perimeter created a sign off sheet from the index database and checked off all boxes as they were removed from the basement. Perimeter staff physically moved the boxes from the basement of the RCDSO offices up the stairs and into the truck. Once loaded for transport Perimeter personnel drove the vehicle directly to our secure facility and unload the boxes from the truck to our storage room located in our office building. At no time were the boxes be left unattended or unauthorized persons had access to the
truck. No third party was used in the packing and moving of the boxes.
Storage of the Boxes
The boxes were stored in our secure location within our office building. This storage facility has monitored security cameras, alarms and entrance was controlled by key to authorized persons through the main office. Boxes were moved from the storage facility to our main office by authorized persons assigned to the project, for document preparation and scanning.
Document Preparation
From the Excel index, Perimeter created and printed document separator sheets which contained all the information required in the electronic file name. An example is as follows: Box1_123456_Registration Application
This was the preliminary file naming structure for the purposes of tracking and document capture. These file names could be modified depending on the requirements of the DM solution but for the purpose of control during the scanning phase of the project this was the recommended file naming structure. The printed separator sheet also contained the necessary information to complete the electronic separation of the individual Member files to stand alone Adobe PDF files. The separator sheet was the lead sheet for each individual Member file with all documentation in the file starting with the blue check sheet as the first page of the file. The document preparation person assigned to a specific box recorded the box number on their tracking document and initialed the box label to confirm they had accepted responsibility for the box. To take advantage of the high-volume scanning capability of our equipment and to be able to price the scanning very competitively by taking advantage of the economies of scale, Perimeter prep personnel pulled all the staples, clips, bindings etc. associated with the documents and created “parts” of 500 – 750 pages per part. The corresponding file folders from which the documents were removed were placed at the back of the box in the same order as they came off the shelf. Perimeter personnel dealt with any sticky notes on documents by either placing them on the same page if this could be done without covering any information on the page or placed them on a blank page immediately following the document they were taken from. Similarly, any odd size or type of documents such as oversize envelopes, courier pouches etc. that may be in the file were prepared for scanning on the high-volume scanners or tagged for scanning on a different scanning system after which the images were placed in the correct location in the electronic file.
We had been advised during our meeting that the scanned physical files were eventually to be securely shredded so there was no requirement to rebuild the pages into their original file folders at completion. In our January meeting we discussed the requirement to create “bookmarks” for each file section within the document. We presented solutions to create the bookmarking during this meeting which were accomplished with a unique process Perimeter used with XV-File Name-XV pages. Once the box had been completed from document preparation, the box was moved to the scanning area awaiting capture.
Document Capture – High Volume Scanning
Perimeter used our new i4850 Kodak high volume production scanners to capture the images. We recommended that the file capture be at 300 dpi, 24-bit colour in order to capture the highest quality image possible and ensure that any coloured highlights, markings etc. on the documents are captured as per their original look. Capturing at 300 dpi colour also provided for a much higher accuracy of the Adobe PDF OCR. By having more data available, the OCR engines could accurately capture the characters correctly and ensure a high “hit” ratio when searching the scanned documents. As the OCR layer in a PDF document sits behind the original captured image, there is no way to edit characters or make changes, so it was vital to have the highest quality data captured from the outset.
The 300-dpi colour capture, we believed, best met the requirements of the Canadian General Standards Board 72.34 standard {2017} and section 6.4.2.2. – Digitization.
Perimeter followed the standards and guidelines as laid out in 72.34 and while not directly involved on the original committee that developed the standard, we had indirect input through our long association with Vigi Gurushanta, the original Chair of the Standard and Principal of IMerge Consulting Inc. and former Head of Document Imaging for Royal Bank. Vigi gave a presentation to our clients July 23, 2008 on the creation of this Standard. A copy of the PPT is available if requested.
The “parts” were placed in to the hopper of the scanner and the scan process was started. Pages were captured in duplex (front and back at the same time) and the scanners ran multi-feed detection while scanning. If two pages tried to go through the scanner at the same time, the scanner stopped, and the operator reviewed the error and either rescanned the selected pages or confirmed acceptance depending on the error. The scanners also had metal detection to stop in the event staples or pieces of staples or metal were detected. Perimeter runs the Kodak Professional Capture software with the scanners which performed rotation of the documents to their correct orientation (portrait or landscape) and deskewed (straightens) the images. Once the part was captured the scanner operator removed any blank images that had been captured from the back sides of the documents where there was no information and they manually edited any images that had not been rotated the correct way. Any additional image adjustment was completed at this time to sharpen, brighten images as needed. The parts were saved to the local hard drive during the scanning process. Once the box had been completed scanned and all the images reviewed and edited, the scanner operator signed off on the box on both the label and tracking document and confirmed its completion.
Back-end Processing
Once a box had been scanned, it went to back end for document separation, OCR and final format for delivery. The scanned “parts” were processed through Adobe Acrobat Professional software to separate the bulk scanned files into individual Member files by the index information. The OCR engine was run on the separated data to OCR all machine text contained on the scanned images. Once completed the files were compared back to the original index listing at the box level to ensure that the same number of electronic files existed as compared to the physical files listed in the index. The box number ID was removed from the electronic file names for delivery to RCDSO (Perimeter maintained a backup with the box ID’s in the event internal tracking of the Member file back to the physical box was required).
Data Delivery
During our meeting in January, it was discussed that RCDSO would like Perimeter to upload the scanned files directly to the new SharePoint solution. Perimeter required the appropriate access, security and process methods to complete the uploads. It was discussed that IT would attend the meeting in order to approve this process. Alternatively, for the pilot project we offered to deliver the data to RCDSO on encrypted USS external hard drives. We would use Vericrypt encryption software, which is a leading industry solution and the software we were currently using to deliver Barrick Gold data. All transfer of drives to RCDSO were completed in person by direct delivery by authorized Perimeter personnel. Drives were contained in a locked delivery pouch during transportation.
Phase 2 Member Files
The above process was utilized for the Phase 2 Member files with the exceptions that any additional documents in the files that were flagged for separation by RCDSO personnel were indexed into the Excel database and separator sheets were created in order to make separate PDF files for each subsection in these files. For Phase 2 there were multiple PDF files for the Member depending on the number of sub sections selected.
Digitization Standards
Perimeter followed the CGSB 72.34 standard for the capture and processing of scanned images. Perimeter confirmed that the images captured at 300 dpi 24-bit colour were exact reproductions of the paper records and met the requirements of the standard. As the deliverable was Adobe PDF, the image of the scanned document was an exact replica of the document with the hidden OCR layer lying behind the image so that no alterations to the scanned image took place. The images captured were able to be stored and viewed on a PC, laptop, tablet, phone or compatible device as authorized and reproduced via printing methods to their original form. By following the standard and the Government of Canada – CRA standard, Perimeter ensured that the scanned images were legally acceptable version of the paper documents.