Archive for December, 2010

Basic Salience API workflow

Wednesday, December 22nd, 2010

Because the Salience Engine is distributed as a library that exposes an application programming interface (API), the primary user of the Salience Engine is the API developer, responsible for integrating text analytics functionality into their application. At Lexalytics, we’ve tried to make this process as easy as possible with wrappers for multiple popular development environments and examples written against each of the wrappers. Also, over time we hope to post more worked examples to this blog.

Underlying all of this is the general sequence of methods calls that developers will commonly implement for use of the Salience Engine. While this sequence can be seen in our examples, this article outlines the sequence outside of any specific development language to explain the logic behind using the Salience API.

(more…)

Building a simple python script to analyze French and English content

Monday, December 13th, 2010

This article assumes the reader is familiar with Linux, python development, and the Salience API.

Last week, we created a simple Windows console application that used the Salience 4.4 distribution, including the LexUtilities library, to determine the language of a piece of content and analyze it with an appropriate Salience session (see “Building a simple application to analyze French and English content“). As mentioned in the previous article, Salience 4.4 introduced the French data directory and other core engine enhancements targeted at text analytics of French content. In the interest of equal attention to our Windows and Linux distributions, this week we’ll create a python script that will use the LexUtilities library and the core Salience API for analyzing English and French content. In the process, we’ll show how to build and install the python modules for Salience 4.4 and LexUtilities on Linux.

(more…)

Building a simple application to analyze French and English content

Monday, December 6th, 2010

This article assumes the reader is familiar with C# development in Visual Studio 2008, and the Salience API.

With the release of version 4.4, Salience introduces the capability to analyze French content as well as English content. In addition to the core engine modifications that were needed to handle processing of French content, a new data directory has been developed specifically for French. The French data directory contains a part-of-speech tagger trained for French, a new entity extraction model, and other components developed specifically for French.

For best results, separate Salience sessions should be set up to process English versus French content, each initialized with the appropriate data directory. This is similar to usage of the data directory customized for Twitter content that was released in August 2010. Salience will not auto-detect the language of the incoming content, so it is up to the application to route content to the appropriate Salience session. With the Salience 4.4 release, Lexalytics has also provided a utility library called LexUtilities which provides a language detection capability. The rest of this article walks through development of a solution written in C# to use LexUtilities in conjunction with the Salience API to handle a mix of content in French and English. (more…)