Support Notes: Deploying the Salience assemblies on a Windows network drive

December 20th, 2012

This is a brief note from a professional services engagement we have in progress currently. Our professional services folks are developing a custom Excel plugin for a client that uses Salience for analyzing the content contained in their worksheets within the familiar Excel environment.

The Excel plug-in integrates with Salience through a Microsoft Office add-in written in C# and the Salience .Net wrapper. The client wanted the end-user desktops to only have the add-in installed, with Salience and its related data directory residing on a mapped network drive. Read on to see how we accomplished this.

Read the rest of this entry »

Feature highlight: Complex stems in Salience 5.1

November 16th, 2012

With every release of Salience, we work to enhance the engine’s ability to extract meaning from the ever-changing landscape of unstructured content. That might sound like a mouthful of marketing mumbo-jumbo, but this is a blog site for developers, so what do I mean by that? What I mean is adding new methods, new techniques, and tweaking existing techniques to handle all kinds of text gracefully, and derive meaning out of the text. One such feature that was added in Salience 5.1 was a new option called Complex Stems. Let’s have a look at how and when you might use this option.

Read the rest of this entry »

To-may-to, to-mah-to. To Salience it’s all the same

November 15th, 2012

Soon after we started releasing support for non-English languages, we started getting questions about what dialects of French or Spanish or Portuguese we were able to analyze. It makes sense, as there are distinct differences within these major languages spoken in different parts of the world. Even in English, you find subtle differences between “American English” and “UK English”. Luckily, for Salience, many of the differences have little impact and those that do can be easily addressed. Let’s look at where these differences appear.

Read the rest of this entry »

Contacting Lexalytics support services

November 1st, 2012

In addition to contacting Lexalytics support services via email at support@lexalytics.com, we have now opened a new channel for contacting us using the Lexalytics support portal at support.lexalytics.com.

This blog article will cover using the support portal to create, edit/update, close, and manage tickets.

Read the rest of this entry »

Support Notes: Windows runtime prerequisites

October 22nd, 2012

In many cases, the runtime support needed for Salience Engine deployments on Windows systems exists on the target systems. However, there are times that installation on a clean Windows system requires the appropriate Visual C++ runtime support. The Windows installers are built to attempt to download and invoke the Visual C++ runtime installs. But there are environments in which, for one reason or another, that download from Microsoft’s site of the Visual C++ runtime fails. In these situations, it’s recommended that the runtime support be downloaded and installed manually through the URLs provided in this article.

Read the rest of this entry »

Multi-language support in Salience

October 12th, 2012

At Lexalytics, we know it’s not only a global marketplace, but a multi-lingual global marketplace. It’s this understanding which has driven us to extend the capabilities of Salience beyond analysis of only English content. This article details the evolution and current state of our support for performing text analytics on English and non-English content.

Read the rest of this entry »

Support Notes: Apache configuration support

October 2nd, 2012

At times, problems may be encountered when setting up Apache to host PHP pages containing Salience PHP wrapper method call which includes a “HTTP/1.0 500 Internal Server Error” when visiting the PHP page and a corresponding /var/log/httpd/error_log “PHP Fatal error: Call to undefined function…” error message entry.

Figure 1: Visiting PHP page with Salience PHP method calls locally hosted by Apache

Figure 2

Figure 2: 1st error—HTTP/1.0 500 Internal Server Error

Figure 3

Figure 3: 2nd error—"PHP Fatal error: Call to undefined function…" in error_log

This is even after verifying the dynamic extension to the saliencefive.so has been added in the php.ini configuration file loaded by Apache.

Under this circumstance, it is observed that the start, stop, or restart of Apache daemon produces the warning message, “httpd: Could not reliably determine the server’s fully qualified domain name, using 127.0.0.1 for ServerName”

Figure 4: httpd start/stop/restart warning

This warning indicates that httpd (Apache daemon) was unable to determine its own name and it may raise no concern since the message informs that Apache would automatically assign a domain name. However, the server needs to know its own name under certain circumstances in order to generate self-referential redirects. This is the main cause for the PHP page hosting problems.

To eliminate the errors, add a fully-qualified ServerName defined in the main httpd configuration file (typically located in /etc/httpd/conf/).

Examples:
SeverName localhost
ServerName 127.0.0.1
ServerName 10.201.2.99 (substitute for actual server IP address)

Figure 5: Adding ServerName entry to httpd.conf

Figure 6: Successfully reaching PHP page

Sizing your Salience Five deployment

December 22nd, 2011

This is another extract from our customer files. Not something that comes up all the time, but often enough that it warranted a blog article with a good worked example.

In general, Salience Engine has been and continues to be very economical in terms of hardware requirements. Text analytics with Salience Engine is more CPU intensive than I/O or memory intensive, though the inclusion of the Concept Matrix™ in Salience Five has increased the memory footprint.

So let’s say you’re looking to process 2 million documents per day, where half are tweets and half are news articles of 4kb or less. What kind of hardware spec are you looking at? Read on to see how you could spec out handling this amount of content with Salience Five.

Read the rest of this entry »

Entity extraction in Salience Five

December 21st, 2011

I wanted to write up a detailed explanation of the methods of entity extraction available in Salience Five for a client, where they overlap and where they differ. And as I did, I thought, “That would make for a bloody useful blog post for the dev blog.” So here it is.

Prior to Salience 4.x, entity extraction was solely list-based. Salience 4.0 introduced model-based entity extraction, which allowed for novel entity extraction. In other words, “I didn’t think to add ‘John Smith’ to my list of people to extract, but Salience Engine found him in today’s news magically because it knows what names of people look like.” Very powerful stuff.

Salience Five continues to provide model-based and list-based entity extraction found in Salience 4.x, with some of the same cross-over between the two and modification to the terminology.

Read the rest of this entry »

Using Salience via PowerShell (part 3): Text Files

August 5th, 2011

Today’s assignment: Convert some docx files to txt and then time how long it takes to process them, getting document sentiment and entities. Use PowerShell.

So first, lets convert the Word documents to text files:

function Save-AsText($fn) {
  $doc = $word.documents.open($fn.ToString())
  $txtName = $fn.ToString().Replace('docx', 'txt')
  $doc.SaveAs([ref] $txtName, [ref] 2)
  $doc.Close()
  echo $txtName
}

$c = Get-ChildItem -recurse -include *.docx
foreach ($fn in $c) {
    Save-AsText($fn)
}

Now that we’ve got our text files, we can use Measure-Command and Measure-Object to do the measuring:

Add-Type -Path "C:\Program Files\Lexalytics\Salience\bin\SalienceEngineFour.NET.dll"
$se = New-Object Lexalytics.SalienceEngine(
             'C:\Program Files\Lexalytics\license.dat',
             "C:\Program Files\Lexalytics\data")
$timings = @()
$c = Get-ChildItem -recurse -include *.txt
$cnt = 0
$s = 0
foreach ($fn in $c) {
   $m = Measure-Command -OutVariable t {
     $rc = $se.PrepareTextFromFile($fn.toString())
     if ($rc -ne 0) {
       echo "Failed to prepare text with code $rc on $fn"
       continue
     }
     $cnt = $se.GetEntities(0, 0, 0, 0, 50, 5) | Measure-Object | Select-Object Count
     $s = $se.GetDocumentSentiment(0).fScore
   }
   $timings += $t[0].TotalMilliseconds
   Write-Host $fn $cnt $s $t[0].TotalMilliseconds
}

$timings | Measure-Object -minimum -maximum -average -sum

And you’ll end up with a summary at the end like this:

Count    : 100
Average  : 511.2
Sum      : 51120
Maximum  : 999
Minimum  : 63

An average of 511 milliseconds per document for the 100 documents processed.