Archive for the ‘Product Development’ Category

To-may-to, to-mah-to. To Salience it’s all the same

Thursday, November 15th, 2012

Soon after we started releasing support for non-English languages, we started getting questions about what dialects of French or Spanish or Portuguese we were able to analyze. It makes sense, as there are distinct differences within these major languages spoken in different parts of the world. Even in English, you find subtle differences between “American English” and “UK English”. Luckily, for Salience, many of the differences have little impact and those that do can be easily addressed. Let’s look at where these differences appear.

(more…)

Multi-language support in Salience

Friday, October 12th, 2012

At Lexalytics, we know it’s not only a global marketplace, but a multi-lingual global marketplace. It’s this understanding which has driven us to extend the capabilities of Salience beyond analysis of only English content. This article details the evolution and current state of our support for performing text analytics on English and non-English content.

(more…)

Building a simple python script to analyze French and English content

Monday, December 13th, 2010

This article assumes the reader is familiar with Linux, python development, and the Salience API.

Last week, we created a simple Windows console application that used the Salience 4.4 distribution, including the LexUtilities library, to determine the language of a piece of content and analyze it with an appropriate Salience session (see “Building a simple application to analyze French and English content“). As mentioned in the previous article, Salience 4.4 introduced the French data directory and other core engine enhancements targeted at text analytics of French content. In the interest of equal attention to our Windows and Linux distributions, this week we’ll create a python script that will use the LexUtilities library and the core Salience API for analyzing English and French content. In the process, we’ll show how to build and install the python modules for Salience 4.4 and LexUtilities on Linux.

(more…)

Building a simple application to analyze French and English content

Monday, December 6th, 2010

This article assumes the reader is familiar with C# development in Visual Studio 2008, and the Salience API.

With the release of version 4.4, Salience introduces the capability to analyze French content as well as English content. In addition to the core engine modifications that were needed to handle processing of French content, a new data directory has been developed specifically for French. The French data directory contains a part-of-speech tagger trained for French, a new entity extraction model, and other components developed specifically for French.

For best results, separate Salience sessions should be set up to process English versus French content, each initialized with the appropriate data directory. This is similar to usage of the data directory customized for Twitter content that was released in August 2010. Salience will not auto-detect the language of the incoming content, so it is up to the application to route content to the appropriate Salience session. With the Salience 4.4 release, Lexalytics has also provided a utility library called LexUtilities which provides a language detection capability. The rest of this article walks through development of a solution written in C# to use LexUtilities in conjunction with the Salience API to handle a mix of content in French and English. (more…)