Language support in Salience

The Salience text analytics engine is designed to support multiple languages with a single flexible codebase. During the development of support for each new language, Lexalytics creates components that are specific to the needs of the individual language. This page links to additional information about the specifics of each language that we support in Salience.

Starting in Salience 6.1.1, we offer "Second Tier" support for an expanded list of languages. Second Tier languages only feature document level model based sentiment and queries.

Starting in Salience 6.1.1.1479 we offer "Basic Languages" support for additional languages. Basic Languages feature support for document level sentiment, entities and queries.

European language support

CJK language support

Second Tier Support

Basic Language Support

Functionality Support

All Salience functionality is developed for analysis of English content first, with subsequent updates to deploy these techniques to the other languages we support. The table below describes the functionality currently available across the currently available language packs.

EN FR ES PT IT DE NL ZH KO JP
Document-level functionality
Core NLP1 Y Y Y Y Y Y Y Y Y Y
Summaries Y Y Y Y Y Y Y Y Y Y
Themes Y Y Y Y Y Y Y Y Y Y
Sentiment2 Y Y Y Y Y Y Y Y Y Y
Query Topics Y Y Y Y Y Y Y Y Y Y
Concept Topics Y Y Y Y Y Y Y Y Y Y
Categories3 Y N N N N N N N N N
Intentions3 Y N N N N N N N N N
Entity-level functionality
Named entities Y Y Y Y Y Y Y Y4 Y4 Y4
Relationships5 Y - - - - - - - - -
Opinions5 Y - - - - - - - - -
Entity sentiment Y Y Y Y Y Y Y Y Y Y
Entity themes Y Y Y Y Y Y Y Y Y Y
User-defined entities Y Y Y Y Y Y Y Y Y Y
Collection-level functionality
Collection entities Y Y Y Y Y Y Y Y Y Y
Collection themes Y Y Y Y Y Y Y Y Y Y
Collection facets Y Y Y Y Y Y Y Y Y Y
Collection Query Topics Y Y Y Y Y Y Y Y Y Y
Collection Concept Topics Y Y Y Y Y Y Y Y Y Y

Notes

1) Core NLP consists of document tokenization, POS tagging, and chunking. Document details enables access to core NLP results such as bigrams and trigrams, POS tags, term frequencies, etc.

2) All languages support phrase-based sentiment analysis, which is the recommended approach. Model-based sentiment is also supported with a default sentiment model in most languages, and a tool provided to enable customers to generate sentiment models from their own content.

3) Categorization functionality based on Wikipedia was released in Salience 5.1.1, support for this feature is currently only available for English. Intention extraction was released in Salience 6, support for this feature is currently only available in English.

4) The default threshold for entity extraction is 55. For improved recall in entity extraction from Chinese and Korean content, we recommend decreasing the default threshold to 35.

5) Entity relationship extraction is a pattern-based feature that functionally supported in each language, but the patterns have not been translated into non-English languages.

6) Entity opinion extraction is a pattern-based feature that functionally supported in each language, but the patterns have not been translated into non-English languages.