Lexalytics Salience Dev Wiki


Customer-Defined Lists (CDL)

Filed in: DataDir.CDLFiles · Modified on : Wed, 07 May 14

Salience Five recognizes a number of different entity types, especially People, Places, Companies and Products. If you would like the software to recognize additional types of entities, or if you are having problems with certain Entities being ignored or picked up incorrectly, the data files under the entities folder can expand and refine the recognition system. The files for a type of entity are contained within the folder of that name (e.g. Entities/Companies). Entities/lists is used for defining new lists of entities. The following is a discussion of the data files you may wish to change.

Customer-Defined Lists

The most commonly used, and most important features of the Salience Engine is its support of customer-defined lists. The default system recognizes entities for:

companies
people
places
products
email addresses
dates

Frequently, users need to recognize other sorts of entities like: publishers or medical terms. In this case the user can build a custom dictionary or .cdl file. CDL files are built by the user and placed in the /data/salience/entities/lists folder. The file format is as follows:

word1 word2<tab>label
word1 word2 word3 word4<tab>label

A .cdl file is included in the system as an example, publishers.cdl. It lists some of the major publishers in the United States. The user may build multiple .cdl files within the lists directory as each cdl file is hashed into the system when the Salience Engine session is created through the API. If you build a new list, any running programs that use Salience Engine will need to start a new session. A .cdl file entry will generally contain between 1 and 4 words, the maximum length of a CDL entry is 12 words.

Users may also choose to mix and match lists in a single .cdl file so that less files have to be managed by the user. The following is an example of a .cdl file that will detect cars and planes:

Subaru	Car
Corvette	Car
Cessna 160	Plane
F22 Raptor	Plane
Volvo V70	Car

CDL files can also support in-line normalization of the customer-defined entities, through an optional third column in the CDL file.

Ford F150	Car	Ford F-series truck
Ford F250	Car	Ford F-series truck
Ford F350	Car	Ford F-series truck
Copyright © 2014 Lexalytics‚ Inc.