salience/sentiment directory contains data files that are relevant to the sentiment analysis performed by Salience Engine. Click on the name of a file for more detailed information below:
|context.dat||List of words used for the contextual model|
|contextual.bin||Binary file used to identify phrases which convey sentiment polarity|
|default.bin||The default model for model-based sentiment analysis|
|general.hsd||The default hand-scored dictionary (HSD) file for phrase-based sentiment analysis|
|general.lsf||The default binary database used for phrase-based sentiment analysis|
|hsd.ptn||The part-of-speech patterns that indicate a sentiment-bearing phrase|
|intensifiers.dat||A tab-delimited list of intensifying phrases with accompanying multipliers|
This datafile contains a list of words with information about whether the are "factual" words or emotive words. This helps the contextual model determine if sentiment was intended or not.
This binary model has been added to support the Use Polarity Model option. When this option is enabled, phrases that contain sentiment-bearing terms but are used in a non-sentiment manner are not included in sentiment analysis.
For example, the phrase "Good morning to everyone." contains positive sentiment words (good), but does not actually convey explicitly positive sentiment, it's just a phrase.
This binary file cannot be modified by users.
default.bin: Customizing model-based sentiment
Model-based sentiment analysis was introduced in Salience 4. The default data directory ships with a basic sentiment model,
default.bin, trained on generic business content. Salience provides a command line tool called the SentimentModelBuilder that users can use to create sentiment models using their own content. Additional guidance on the use of the SentimentModelBuilder will be posted to the Developer blog.
general.hsd: Customizing phrase-based sentiment
The most common method of adjusting sentiment analysis within Salience Engine is through HSD files. An HSD file provides a listing of sentiment-bearing phrases and human-judged sentiment weights for these phrases. Phrase-based sentiment analysis can also be adjusted through negators and intensifiers.
The default data directory provides an HSD file developed by Lexalytics for use with general content. Customization of phrase-based sentiment analysis begins with this HSD file. It can be copied into an equivalent location in a
user directory and used as the based for a custom HSD.
Users that are looking to customize sentiment analysis should review sentiment output, particularly the phrases that are contributing to sentiment results. Phrases can be added or edited in the HSD file with appropriate sentiment weights.
Phrases that should be explicitly excluded from sentiment analysis can be indicated with a tilde (~). This is different from applying a sentiment weight of zero, which still considers the phrase in sentiment calculations, but the zero weight has a neutralizing effect. This also differs from the use of the tilde (~) operator in other data files such as pattern files, where the operator is used to control case-sensitivity.
Starting in Salience 6.1, queries can be used in any line of the HSD. If a line has a query operator (such as AND, OR, and *), the line will be treated as a query. Any text in the document matching a non-negated portion of the query will be assigned the provided sentiment value. For example:
userdirectories for customizing sentiment
This binary file is the underlying database used for phrase-based sentiment analysis. Phrases that exist within the LSF are overridden by any that also appear in HSD files that are added through API calls. This file cannot be modified by users.
This file provides the part-of-speech patterns that indicate a sentiment-bearing phrase. Modifications to this file are not recommended because of the effect on sentiment analysis quality and performance. Please contact Lexalytics support if you have questions about adjusting the default sentiment phrase patterns.
This file provides a tab-delimited list of intensifying phrases with accompanying multipliers. When an intensifier occurs before a sentiment-bearing phrase, the multiplier is applied to the sentiment weight of the sentiment-bearing phrase.
For example, assume the following HSD entry:
And the following entry in intensifiers.dat:
An occurrence of the phrase "very good" would contribute a sentiment weight of 0.6 to document-level (or entity, theme, or topic sentiment where applicable) sentiment.
Modifications or extensions to the list of intensifiers should be made in a
user directory (eg.