Extract Text with VISDOM™
Overview
Oftentimes, when legal, contract risk, and procurement professionals review contracts and related documents that may bind organizations, they attempt to reduce risk for their organization. They may utilize techniques that have been taught to them, items they understood from the organizations policies, laws, applicable regulations and trends in the market, to name a few. Some examples of these risk reduction items include: recognizing terms indicating an area of caution in the contract language, scanning for identifiable information (e.g., numbers, money, dates, credit card numbers, address, bank account information, etc.), reading for phrases such as all liability, governing law, and deep, comprehensive meaning of the language found in the document.
As an organization grows, the volume of the risk reduction review duty increases and becomes tasking; little details may be missed if the professional is overworked, distracted, or rushed. It could be prudent for the individual and organization to teach the software program they use how to recognize similar items, such as text, phrases, and identifiable data (as listed above). This type of tool may be helpful with assisting the professional as an additional tool combined with the professional’s personal review of the document. CobbleStone has evolved their text recognition engine with the introduction of VISDOM Artificial Intelligence. As software tools advance, it is still important to have legal documents reviewed by a legal professional such as an attorney. The tools discussed below are not intended to be used as legal advice as only an attorney should provide legal advice.
Example of Text Recognition with VISDOM AI
As we cover the topic of text recognition with artificial intelligence, it is important to understand that artificial intelligence is not perfect. For it to work on text recognition, one must consider the inherent challenges with reading and reviewing documents. First, the documents may not be in useful formats for the engine to review and recognize text. Second, the documents’ text may be too small, bunched together, skewed, or degraded if the they were scanned documents or contain image or text over lay (even humans can have difficulty understanding text if the text is impeded), foreign text, or even pages missed. Once a document is introduced to the CobbleStone Software’s contract entry screen the engine, if enabled in the Enterprise edition, attempts to review the document to see if it is text-based. If the document is not, the engine attempts to recognize text via optical character recognition (OCR) (which has inherent limitations). Once complete, VISDOM attempts to run the document text through the rules it was taught to help extract text, phrases, and identifiable information. If configured, it will try to place the extracted data in data fields to help with contract data entry. This may be helpful in assisting the professional with entry and review of contract documents. Think of it like a machine review of the document text based on rules taught to it.
Configuring Text Extract Rules for New Contracts
1. To Configure the Text Extracting Rules for a newly added contract records navigate to Manage/Setup - VISDOM Configuration.
2. The VISDOM: Processes page displays. To work with an existing VISDOM Process, click Edit Process for the desired line item. To add a new process, click Add New.
Adding a New VISDOM Process
There may instances in which specific rules are needed for text extraction for a type or the existing rules simply do not meet the needs of the organization. In those cases, a new process may be desired.
1. Click Add New.
2. a Pop-up window displays. Select the desired items for each of the three areas:
A. Area - For which area or module does the rule set apply? Select the area from a list of the major CobbleStone modules.
B. Action - Select Add (record) with file.
C. Type - Select the desired specific record type or All Types.
3. Click OK.
4. The pop-up closes and the list of VISDOM Processes displays with the new process included in the list.
Adding Fields to a Process
1. Click Edit Process for the desired line-item.
2. Process data displays at the bottom of the page. Scroll to the Edit area.
3. Click Add Field.
4. A pop-up window displays. Select the desired field from the table for the area selected for the process.
5. Click OK.
6. The pop-up window closes and the field list displays. An algorithm must be defined for each field.
Adding and Defining an Algorithm
Each field must have at least one algorithm defined so VISDOM can take the definition and apply it as text extraction rules.
1. Click the field desired.
2. Click Add Algorithm.
3. A pop-up window displays. Select the algorithm desired.
Different field types have different options available.
Drop-down/Pick-list
Date
Number
Text
4.Click an algorithm to review the meaning of each.
5. Open the list again to select a different algorithm if necessary.
6. Click OK once the desired algorithm has been selected.
7. The pop-up window closes and the field list again displays. Select the field desired.
8. Click Edit for the algorithm to define for that field.
9. A pop-up window displays. Alter each setting as needed. Your business rules may vary.
Note: Different algorithms have different settings to be applied. The above example is for Jaro-Winkler Similarity.
10. Click OK.
11. Repeat the process for each field needed for the process.
Field Options
Each field selected has a set of six (6) options:
A. Clean White Space - Set to True to remove white space within a phrase when checking for a match.
B. Remove Punctuation and Symbols - Set to True to ignore non-standard characters when checking for a match.
C. Remove Noise Words - Set to True to ignore words defined in the Noise Words Dictionary when checking for a match.
D. Correct Line Breaks - Set to True to review the text without line breaks/carriage returns when checking for a match.
E. Remove Common Company Name Words - Set to True to ignore words defined in the Common Company Name Words dictionary when checking for a match.
F. Remove DocuSign Stamp - Set to True to ignore DocuSign stamps when checking for a match.
Your business rules may vary for each of the settings.
Note: it may be impossible to configure for every scenario as each contract document is different.
Testing our configuration
We can check our configuration by adding a new record with VISDOM.
1a. From the top navigation menu, select module and click Add (record) with VISDOM AI. While the example below shows the Contracts menu, other record types can use VISDOM AI too.
1b. Alternatively, drag and drop the desired file onto the grey Drag and drop a file to create a new record box then select the module desired. This box displays at the top of the side menu.
2. VISDOM tries to recognize the text and extract it based on the rules configured. In addition, VISDOM tries to extract the Title of the contract, check for intelligent data it learned from the systems counterparty names (vendor and customers), locations (city, states), employee names, e-mail addresses and identify this information as well (it learns from the data in the system).
3. After VISDOM's search and extraction are complete, we see the file and data entry fields.
In the example above, the Contract Title and Vendor/Client Name were both successfully extracted.
Note: It is recommended to review and edit the extracted text to ensure the text extracted is accurate and reflective of the data you want tracked.
Diving deeper into the text extraction engine
Seen below, we notice other tabs above the image of the document that are helpful.
Preview: The Preview tab contains an image of the document and each page.
Doc Text: The Doc Text tab shows the text extracted from the document. Due to quality and extraction limitations, this text should be verified as the extracted text is not guaranteed to be exact. Text may not be perfect as it is a result of OCR, scanning, and may be obscured by issue with the text, page breaks, missing pages, overlays, headers, and other items related to document processing.
Decisions: The Auto Exact Text tab displays the results VISDOM AI found after running the document thought the configured Auto Extraction rules. Confirmation of the actions should be performed by the legal reviewer.
Clauses: The Clauses table tab attempts to extract each clause/paragraph in a separate viewable box. This may be helpful to ease data entry by allowing the user to copy and paste extracted text into data fields.
Extract Sentences: The Extract Sentences table tab attempts to extract each sentence in a separate viewable box based on the period/full stop mark (English support). This may be helpful to ease data entry by allowing the user to copy and paste extracted text into data fields.
Auto Text Extract Tab: The Auto Exact Text tab helps to establish other identifiable information such as dates, numbers, names, locations, etc. based on patterns and other data to which VISDOM has access (e.g., Customer and employee names and related information). In this example, it found a numeric value that it assigned to the Contract Amount field.
Note: it is recommended to confirm all extraction results prior to saving the data.
Advanced configuration. VISDOM also supports configuration of regex or regular expressions to locate patters of text like social security numbers, bank information, telephone numbers, and more. See a Cobblestone representative for more information about advanced VISDOM text pattern recognition.
Note: The text engine is not exact and a legal professional should review all legal documents.
|