"We have been working with [ITC Software] for more than 2 years now. We
started with contract basis work and finally came to ODC which is
completely dedicated for our tasks. High team stability, good
proficiency and efficient administration support are the factors which
bring high value to our collaboration." ..../more
A U.S. company specializing in providing integrated information solutions to business and professional customers worldwide.
Challenge
Our client offered a variety of electronic information guides that were originally provided as paper books. To expand the scope of service, the company was going to produce seamless product packages that would be sold as DVD sets. Our task was to collect and classify all the available information and to make it searchable and accessible via a simple-to-use viewer application.
There were a number of challenges that we faced as we started work on the project:
the documents to be included in the product packages had been produced by several teams of writers who applied the company standards for documents rather loosely; as a result, documents were created using different applications and were often formatted differently;
there were a lot of mistakes detected in the use of the tags that were supposed to facilitate the document parsing process;
a lot of images used as illustrations were embedded in various kinds of documents, which made the image extraction process complicated and time-consuming;
the documents to be used in the product packages were continuously edited and expanded by the authors, so we had to do several iterations before we could finally compile the packages;
the viewer that was to be supplied with the DVD sets had to include a search engine and provide a capability to load, display, and scroll large documents with a lot of illustrations.
Solution
We started with error correction: the discrepancies in the tag usage and style formatting were eliminated, and the documents to be used in the product packages were brought into line with a unified standard. This made the process of parsing the documents and saving them in the XML format fast and easy.
In order to improve the image extraction process, we employed the Optical Character Recognition technology. The processed images were linked to the text and saved in the GIF (preview) and PDF (full view) formats.
The parsed documents were stored in MS SQL database from where they could be easily retrieved as necessary. Utilizing the database, we generated various XML structures for the same content depending on the client's requirements.
The viewer that we have created is a desktop application with a friendly graphic user interface that enables the users to browse through the entire product package, view the desired documents, and search through them. Furthermore, several versions of the viewer application have been developed to fit different contents of the product packages offered by the client.
Technologies Utilized
MS SQL, Access, XML, XSLT
Microsoft Visual Studio .NET (C#)
InterLok(R) v5.3, XpdfViewer 3.0.31.0, Mdac v. 2.8, PDFNet for .NET
Optical Character Recognition
Results
We have successfully created a collection of information in the XML format to be used for multiple purposes, from publishing it in various external systems to displaying it on the client's Web sites to producing a series of integrated information guides in the DVD format. This way convenient reference tools are provided for finding and viewing the desired information on the computer screen and / or printing it out.
The project delivered by us has enabled professionals in various fields to have all the necessary information at their fingertips. Thanks to the success of the project, our client's position of a leading information technology publisher has been significantly reinforced.