Analyzing collections with Salience Five

July 25th, 2011

This is a bit of a forward-looking blog post about new features that we’re debuting in Salience Five. At our Lexalytics User Group meeting in New York in April, we introduced the “collections” functionality that will be provided in Salience Five. Salience Five is in beta right now, so I decided to put together a worked example of collection functionality using some customer review data for Bally’s in Las Vegas gathered from a public website.

Read on to see how you’ll be able to use collections to analyze a group of documents as a cohesive set, extract the commonly occurring themes (with rollup using our concept matrix), and other pieces of actionable data we’re calling facets.

Read the rest of this entry »

Salience .NET via IronPython

April 29th, 2011

To round out my overview of ways to get quickly up and running with scripting Salience on Windows, I’ll conclude with another way to take advantage of the .NET wrapper: IronPython

Read the rest of this entry »

Using Salience via PowerShell (part 2): Tabular data

April 23rd, 2011

A common request from customers looking to evaluate Salience Engine is to process sample set of data.  Often this will take the form of an Excel or CSV file where there is a column that contains the text to be processed.   I’m going to show one way of tackling this problem, using PowerShell.

Read the rest of this entry »

Ten things to know about Salience (part 2)

April 22nd, 2011

This is a follow-on to our list of ten things to know about Salience Engine. Together, these two articles are intended to guide developers in some of the main aspects of working with Salience Engine when they first start out.

In the first part, most of the topics focused on deployment strategies and approaches. In this second part, we’ll look at areas of tuning results from Salience Engine. So let’s roll up our shirt-sleeves and get back into it…

Read the rest of this entry »

Ten things to know about Salience (part 1)

April 22nd, 2011

I had a meeting with a client recently, and one of the suggestions they raised was a list of the top 10 things that an engineer should know when they start working with Salience Engine. Some of these may seem basic, however it’s not safe to assume that things which seem obvious actually are. With all due respect to David Letterman and his Late Night Top Ten lists, here we go…

Read the rest of this entry »

Using Salience via PowerShell (part 1)

April 15th, 2011

By way of introduction, my name is Matt King and I’m a Solution Architect in the the Lexalytics Services group. I’m also the guy who brought you the interactive Salience python script the other day.   Most of my current work is on Linux (Python/Java/bash/etc) and both my home and work laptops run OS X as the primary OS.    I do have VMWare Fusion with a Windows OS, but until a day or two ago that was a copy of XP Professional that I dutifully purchased back in 2008.

After upgrading to Windows 7, I was looking around for something to do. As I’ve been hearing good things about PowerShell I figured it was worth checking out.   But what to do with it?   I’d heard that one of the cooler things, besides the object passing pipelines, is that it allows easy access to just about everything via .NET.   And Salience comes with .NET wrapper… Read the rest of this entry »

Getting started with Salience Engine in python

April 6th, 2011

One of the key strengths with Salience Engine is that it is provided as a library, which customers can integrate into their own systems. In order to make the integration easier, we provide wrappers for some of the most popular development environments; namely .NET, Java, PHP, and python. The first hurdle for a developer to cross in accessing Salience Engine is getting the wrapper of choice set up within their development environment so they can start coding against it. This blog article shows how to build and deploy the python wrapper for Salience Engine on both Windows and Linux. Also provided is an interactive script written by one of our professional services engineers that can also be used to get your feet with Salience Engine in a python environment.

Read the rest of this entry »

¿Sabe usted español? Você sabe português?

March 10th, 2011

The Salience Engine developed by Lexalytics is capable of industry-leading entity extraction, sentiment analysis, theme extraction, summarization, and other text analysis. Prior to our latest release however, this functionality was limited to the analysis of English content only. Our release of Salience 4.4 introduced support for the analysis of French text. This support was built from the ground up to provide the engine with a deep and native understanding of French. Our French support is already being deployed by MediaVantage (MédiaVantage et Lexalytics offrent la une analyse automatique de ton en plusieurs langues) for their clients that need to analyze both English and French content. As my grade school French teacher would say, “C’est formidable!”

For our next release later this year, we’re setting our sights on additional language support. This time, Spanish and Portuguese. In order to achieve this, we will follow the same recipe of gathering knowledge for native language resources to put together the building blocks needed to provide true support for Spanish and Portuguese.

If you are fluent in Spanish or Portuguese, this is where you can help us. We are looking for resources to assist with the annotation of content for use in training our engine. NOTE: This is not translation!

Please review the following criteria carefully:

  1. Must be a native or highly fluent speaker of Spanish (Latin American or European) or Portuguese (Brazilian or European).
  2. We prefer individual contractors, not agencies.
  3. No placement agencies, please.
  4. We prefer US-based. Makes payment, taxes, etc. much easier.
  5. Would prefer resources in the Boston area, but not required.
  6. This is short-term contract work, not full-time employment.
  7. You must have a Windows PC that you can use to run our utilities that aid the annotation work.

If you meet these criteria, and are interested in helping us understand Spanish or Portuguese, please get in touch with us by emailing hiring@lexalytics.com, with “Spanish contract resource” or “Portuguese contract resource” as the subject. We look forward to hearing from you.

Gracias a todos. Muito obrigado.

Does @CharlieSheen really have @klout?

March 4th, 2011

It’s been impossible this week to get away from Charlie Sheen. He’s all over the airwaves on television and radio, and seems like everyone is commenting on his interviews on Twitter, Facebook, and other sites. But one thing that caught my attention (thanks to @eric_andersen)yesterday was an article posted on klout, “Charlie Sheen Needs a Klout Score”.  The article mentions the Charlie Sheen’s Klout score and justifies it, concluding with the statement:

At Klout we measure influence which we define as the ability to drive measurable actions across the social web. Charlie’s first tweet contained a link to a picture – that link has been clicked through 455,000 times at the time of this writing (6:39PM PST).

So continuing in the tradition of Friday blog articles that are less technical but still related to text analytics or software engineering in general, I wanted to think about whether this really does provide a measure of influence.

Read the rest of this entry »

Elementary, my dear @IBMWatson!

February 11th, 2011

Or perhaps that should be “Jeopardy, my dear Watson”. By now, you’ve hopefully heard of the IBM project called Watson to develop a computer capable of competing on the quiz show Jeopardy. Scratch that, not just competing, but competing against two of the best players the show has ever had. And, if Watson works as designed, beating them.

I’ve been a fan of Jeopardy since I was a kid, so that angle of the story interested me from the start. But watching segments on NOVA about the project team addressed the challenges in developing a machine capable of understanding human language, it struck me as very relevant to the challenges we face in text analytics engine. If you haven’t heard much about Watson, I highly recommend the video “Building Watson – A Brief Overview of the DeepQA Project”. Without a doubt, Watson goes far beyond the applications we are dealing with. But there is synergy (buzzword bingo score) in the fundamental building blocks and approach, and it’s very exciting to see where this can all head. Here’s some of the core problems text analytics problems that Watson faces, and how they relate to us.

Read the rest of this entry »