<Lexalytics root>/data/salience/sentiment

<<Salience/Relationships | Back to Data/Salience Directory index | Salience/Tags>>

The salience/sentiment directory contains data files that are relevant to the sentiment analysis performed by Salience Engine. Click on the name of a file for more detailed information below:

context.dat List of words used for the contextual model
contextual.bin Binary file used to identify phrases which convey sentiment polarity
default.bin The default model for model-based sentiment analysis
general.hsd The default hand-scored dictionary (HSD) file for phrase-based sentiment analysis
general.lsf The default binary database used for phrase-based sentiment analysis
hsd.ptn The part-of-speech patterns that indicate a sentiment-bearing phrase
intensifiers.dat A tab-delimited list of intensifying phrases with accompanying multipliers

Customizing sentiment analysis in user/salience/sentiment

context.dat

This datafile contains a list of words with information about whether the are "factual" words or emotive words. This helps the contextual model determine if sentiment was intended or not.


contextual.bin

This binary model has been added to support the Use Polarity Model option. When this option is enabled, phrases that contain sentiment-bearing terms but are used in a non-sentiment manner are not included in sentiment analysis.

For example, the phrase "Good morning to everyone." contains positive sentiment words (good), but does not actually convey explicitly positive sentiment, it's just a phrase.

This binary file cannot be modified by users.


default.bin: Customizing model-based sentiment

Model-based sentiment analysis was introduced in Salience 4. The default data directory ships with a basic sentiment model, default.bin, trained on generic business content. Salience provides a command line tool called the SentimentModelBuilder that users can use to create sentiment models using their own content. Additional guidance on the use of the SentimentModelBuilder will be posted to the Developer blog.


general.hsd: Customizing phrase-based sentiment

The most common method of adjusting sentiment analysis within Salience Engine is through HSD files. An HSD file provides a listing of sentiment-bearing phrases and human-judged sentiment weights for these phrases. Phrase-based sentiment analysis can also be adjusted through negators and intensifiers.

The default data directory provides an HSD file developed by Lexalytics for use with general content. Customization of phrase-based sentiment analysis begins with this HSD file. It can be copied into an equivalent location in a user directory and used as the based for a custom HSD.

Users that are looking to customize sentiment analysis should review sentiment output, particularly the phrases that are contributing to sentiment results. Phrases can be added or edited in the HSD file with appropriate sentiment weights.

Phrases that should be explicitly excluded from sentiment analysis can be indicated with a tilde (~). This is different from applying a sentiment weight of zero, which still considers the phrase in sentiment calculations, but the zero weight has a neutralizing effect. This also differs from the use of the tilde (~) operator in other data files such as pattern files, where the operator is used to control case-sensitivity.

jerk chicken<tab>~

Starting in Salience 6.1, queries can be used in any line of the HSD. If a line has a query operator (such as AND, OR, and *), the line will be treated as a query. Any text in the document matching a non-negated portion of the query will be assigned the provided sentiment value. For example:

tinny WITH sound -0.7 would penalize documents containing tinny in the same sentence as soun.
best* 0.5 would match anything starting with best (best, bested, bestest)

IMPORTANT

-Multiple HSD files can be placed within user directories for customizing sentiment
-See Sentiment Options for API methods for setting the HSD file(s) to use
-See "Adjusting out-of-the-box Salience sentiment" for more information on assessing and adjusting phrase-based sentiment analysis
-The tilde (~) operator does not control case-sensitivity in HSD files (which are case-insensitive), it indicates that the specified phrase should not be considered (blacklisted) with respect to sentiment calculations



general.lsf

This binary file is the underlying database used for phrase-based sentiment analysis. Phrases that exist within the LSF are overridden by any that also appear in HSD files that are added through API calls. This file cannot be modified by users.


hsd.ptn

This file provides the part-of-speech patterns that indicate a sentiment-bearing phrase. Modifications to this file are not recommended because of the effect on sentiment analysis quality and performance. Please contact Lexalytics support if you have questions about adjusting the default sentiment phrase patterns.


intensifiers.dat

This file provides a tab-delimited list of intensifying phrases with accompanying multipliers. When an intensifier occurs before a sentiment-bearing phrase, the multiplier is applied to the sentiment weight of the sentiment-bearing phrase.

For example, assume the following HSD entry:
good<tab>0.4

And the following entry in intensifiers.dat:
very<tab>1.5

An occurrence of the phrase "very good" would contribute a sentiment weight of 0.6 to document-level (or entity, theme, or topic sentiment where applicable) sentiment.

Modifications or extensions to the list of intensifiers should be made in a user directory (eg. [user directory]/salience/sentiment/intensifiers.dat).