Archive for December, 2011

Sizing your Salience Five deployment

Thursday, December 22nd, 2011

This is another extract from our customer files. Not something that comes up all the time, but often enough that it warranted a blog article with a good worked example.

In general, Salience Engine has been and continues to be very economical in terms of hardware requirements. Text analytics with Salience Engine is more CPU intensive than I/O or memory intensive, though the inclusion of the Concept Matrix™ in Salience Five has increased the memory footprint.

So let’s say you’re looking to process 2 million documents per day, where half are tweets and half are news articles of 4kb or less. What kind of hardware spec are you looking at? Read on to see how you could spec out handling this amount of content with Salience Five.

(more…)

Entity extraction in Salience Five

Wednesday, December 21st, 2011

I wanted to write up a detailed explanation of the methods of entity extraction available in Salience Five for a client, where they overlap and where they differ. And as I did, I thought, “That would make for a bloody useful blog post for the dev blog.” So here it is.

Prior to Salience 4.x, entity extraction was solely list-based. Salience 4.0 introduced model-based entity extraction, which allowed for novel entity extraction. In other words, “I didn’t think to add ‘John Smith’ to my list of people to extract, but Salience Engine found him in today’s news magically because it knows what names of people look like.” Very powerful stuff.

Salience Five continues to provide model-based and list-based entity extraction found in Salience 4.x, with some of the same cross-over between the two and modification to the terminology.

(more…)