Archive for September, 2010

Using the Twitter data directory for Salience

Monday, September 27th, 2010

A couple months ago, we put out a release for Salience that included a new twitter_data directory. As announced on our corporate blog, this directory of Salience Engine data files has been customized for better handling of Twitter content. The enhancements include the recognition of mentions (ex. @Lexalytics) and hashtags (ex. #textanalytics), as well as associating sentiment with various emoticons and Twitter jargon.

Note: it is not intended that the Twitter data directory will replace the default data directory. Each contains data files that are tailored to particular types of content. The Twitter data directory will not produce as good results as the default data directory when used on news content. Similarly, the Twitter data directory was created to address characteristics unique to Twitter content better than the default data files.

Now that we know it’s there and what it is, how do our developers take advantage of the new capabilities? The remainder of this blog post gives an introduction on using the new Twitter data directory within your Salience environment.


Working with Salience theme output

Tuesday, September 21st, 2010

Recently, I’ve been writing blog posts to give a little more guidance to particular aspects of Salience Engine. This is in response to customer requests for more how-to information. Although our developer wiki provides basic information about the mechanics what is in the Salience API, it’s lacking a bit of the guidance on using the API. In this post, I’m going to take a look at themes that get extracted by Salience, and some guidance on using theme output. If you’re a customer, the first thing you notice about themes as you start running content through is that if you’ve got a lot of content, you get a flood of themes. How do you trim that down to a reasonable number of high-quality themes? Let’s take a look…


Adjusting out-of-the-box Salience sentiment

Sunday, September 12th, 2010

A couple weeks ago, I posted an article giving some guidance on scaling up use of Salience, for example in multi-threaded scenarios. This week, I’d like to take a look at another area that we get questions from users. Out-of-the-box, Salience Engine provides document-level and entity-level sentiment. The components that go into determination of sentiment for a piece of content are based on years of effort and tweaking by Lexalytics. That said, we also know they don’t fit in every situation, particularly when you’re dealing with a focused vertical with specific jargon. So we rely on a key strength of the Salience Engine, which is its ability to be customized. In this article, I’ll go through some techniques for assessing and adjusting sentiment analysis in Salience Engine.