PDF Print E-mail
Chapter 6 - Computer Concepts and Legal Applications

Full Text Compared to Images

A classic study revealed that a full text search failed to retrieve a significant number of full text material that was relevant to a case. The classic study commonly referred to as the Blair and Maron Report is actually named “An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System.” authored by David Blair and Professor M.E. Maron. (Communications of the ACM, March 1985, 28:3) The paper was based upon the massive Bay Area Rapid Transit (BART) accident case where a computerized train failed to stop at the end of a line and crashed through the wall and into the parking lot. The resulting lawsuits reached the amount of 250 million dollars. There were 350,000 documents that were relevant to the case. One of the law firms “full texted” all the documents and reasoned that with the right search you could find anything. The startling conclusion was that the software retrieved only 20% of the relevant documents out of 350,000 documents on-line!

Images, such as a TIFF file, are an “electronic snapshot” of a paper document. It is important to note that images CANNOT be searched using full text software. The words on an imaged document are not in a “full text” or ASCII formats. They are merely dots on a bitmap digital image. To locate a document image, the image must be linked with an index or database. The database is searched and after the database record is located, the attached image can be viewed. The cost of an image is approximately 10 cents or less per page. Converting a document to full text requires OCR software and is inexpensive. However, if “cleanup” is desired the cost generally increases from $.50 to $2.00 per page. Images generally take approximately 50 K of storage for each page. This is to be contrasted with full text where a 400-word one-page letter only occupies 3 K of space, since only the words are digitized and not the whole page.

Full text conversion of your material is not the complete answer to controlling factual information in your cases. One of the main limitations to full text searching is language. The language used to describe any event is too variable. An event, person or concept can be described in a number of ways with different words. I can refer to a person as John Smith, husband, manager, friend, owner, debtor, and so on. Words are inartfully used without standards among people. Another example is the use of medical terms among the lawyers, physicians and others in a case. Was it a broken arm, fractured arm, or comminuted fracture? Dates also pose a problem since a May 15, 1990 date will be missed if listed as 5/15/90.

Limitations of Full Text

Full text software still lacks sorting and precision in managing text in a structured manner. Full text software references, text but does not manage it. Full text lacks the structure and precision of a database.

Should You Convert Documents to Full Text?

In a typical lawsuit, you will obtain written discovery from opposing counsel. These can be answers to interrogatories, tax returns, corporate documents, and any other written documents pursuant to a Request for Production or other discovery mechanism. If you cannot receive the information in a digital format, a decision has to be made whether to convert these written documents to ASCII full text, so that you can search these materials with a full text software program. Besides the cost of conversion, which may be substantial, one issue that has to be addressed is whether full text conversion will meet your needs to control the document information in a case.

There is one school of thought that all of the documents in a case should be converted to full text, no matter the cost. They will argue that this meets their needs for document control and that no database coding is needed. They will contend that full text searches will locate the evidence they'll need in a case. However, as discussed above, there are severe limitations in locating information in a full text only environment. For example, will a search for "Mr. Kowoski" in 1000 documents that have been converted to full text, find all the relevant documents? Probably not, because Mr. Kowoski may be referred to as a "manager", "Frank", "sales manager," or any number of references other than his last name.

To counter the problem, some attorneys resort to abstracting or coding the complete document collection, depending on the size of the document collection. This also had its inherent limitation because someone would have to decide the proper coding, certain codes may be missed or the coder might be inexperienced or inattentive when valuable documents were being reviewed.

Many of today’s experts recommend that you use a combination of full text and indexing. They suggest that you full text the important documents in your case and code the other documents. If part of the abstracted documents becomes relevant later on, then they can be OCR’ed and then added to your document collection. Also, with decreasing scanning costs to “image” a document and the immense storage capabilities of CD-ROM, the actual attached image of the document can be attached to either the abstracted document database or to the OCR version. The image itself can be converted to full text at any time.

Full Text Compared to Database Abstracts

Some factors to consider:

  • Searches - In full text, they are made against the complete text of the document - so there is no chance of not locating the specific information because of coding errors.
  • Searches - In databases, searches are made against the coding that is connected to the document. Abstracted coding may not have identified all the key words, concepts, issues, or persons connected to the case. In fact, complete documents may be missing.
  • Subjective coding - Aside from objective coding - date, author, etc. - the subjectiveness of the document coding is reduced by using full text searches, since you have no one who is deciding upon the relevance and interpretation of documents and their relation to the case.
  • Breadth of material - Once you have the document in full text, you always have the complete document to search, whereas in an abstracted database, you only have the coded material.
  • Use of Different Terms - In full text, the use of different terms for the same subject or event results in fewer “hits” of the relevant information. This problem is solved in database abstracting by using consistent terms throughout the coding process.
  • Costs - Depending upon the condition of the documents, conversion of paper to full text and cleaning the documents up will cost more than the objective coding of a document and the attachment of an image. However, this will depend on the type of documents, etc.

Chapter 7 provides further legal application techniques for full text documents and techniques on how to effectively manage these types of documents.

 

Find Legal Software


Sponsors






eDiscovery Alerts

Click here to sign up for ediscovery e-mail alerts that provide news on the latest electronic discovery and evidence issues.