Frequently Asked Questions (FAQ)
Filed in: Main.FAQ · Modified on : Wed, 29 Feb 12
- What are the minimum and maximum values possible for the sentiment scores?
- Document sentiment scores fall between -1 and 1. Entity and Theme sentiment scores will fall between -10 and 10, however it is very important to note that this is not a fixed scale, just where we clip the scores. There is no set scale for Sentiment scores, as they are based on the sentiment-bearing phrases contained within an individual document. Particularly when looking at entity sentiment, we recommend against comparing the absolute sentiment scores, but using them as a indicator and as an aggregate metric (ex. Lexalytics was mentioned in 100 documents; 94% of which mentioned Lexalytics positively)
- What is a good neutral range for sentiment?
- When dealing with sentiment scores, bear in mind that neutral sentiment does not only exist at a sentiment score of 0. We consider neutral to be a range where there might be some measurable sentiment within a piece of content, but not enough that you could firmly state the document is positively or negatively toned. Same concept occurs at the entity and theme level. The neutral range we generally recommend for document sentiment is -0.05 to 0.22; we find that overall document content is slanted to be slightly positive by default. At the entity and theme level however, it's more even, and the neutral range we suggest is -0.45 to 0.5. As expected, YMMV (your mileage may vary), as it's wholly dependent on the scope of content you are dealing with. But these ranges give a starting point.
- What are the minimum and maximum values possible for the evidence?
- The evidence score will range from 1 to 7. An evidence score of 4 indicates there was enough evidence to give a high degree of confidence in the sentiment score for the associated entity.
- What are the minimum and maximum values possible for the theme scores?
- As with the sentiment scores, there is no set scale for theme scores. As the text is analyzed, themes are extracted and scored, and as additional occurrences of the same theme are encountered, the scores are increased. As such, a theme score will never be lower than 0 but does not necessarily have an upper bound. As with sentiment scores, we do not recommend comparing theme scores across documents. A theme which scores a 4 in one document may be much more significant in that document than a theme which scores a 4 in another document, it all depends on the scores for other themes in the respective documents. On an individual document basis, a higher score indicates more usage of the theme and thus it is more relevant to the content. Across documents, themes and meta-themes are more useful in indicating commonly occurring themes throughout the document population.
- How can sentiment scores or theme scores be used for comparison of documents?
- In general, the advice we give is to use the numbers beneath the covers. Showing that X% of the documents in your document set were positive, Y% were negative, and Z% were neutral can then be used to allow a user to drill down into the segment of interest, without pitting one document’s sentiment score of 0.8 versus another document’s sentiment score of 0.6. Both are positive, one may be slightly more positive than the other but the fact that the delta is 0.2 has less relevance. With themes, you can use the scores to pluck off the top N themes for each document. But comparing a theme of “broadcast media” in one document with a theme score of 4.25 to “online media” in another document with a theme score of 5.0 is less relevant. The results for each individual document, particularly the precise numerical scores, are generally less important than the story that can be told across the entire data set by observation of trends or segments. Another example would be looking at sentiment over time. Say you’re analyzing content from a news feed, and aggregating your results. On one day, you may see 20 stories out of 100 which mention Lexalytics, and the entity-level sentiment scores for Lexalytics are all positive. Next day, there are 95 stories out of 100 which mention Lexalytics, again all of which have positive entity-level sentiment for Lexalytics. There’s something that’s happened, looking at the themes across all documents on day one versus day two might give you the story, and certainly allowing the user to drill down to the original content will give them insight into what caused the change. Similarly, consider that on the next day, 50 stories out of 100 mention Lexalytics, but now entity-level sentiment across those 50 indicates that 40 of those stories have negative sentiment for Lexalytics. Again, it shows that some change has happened, and allows the end user to navigate through the data.
- How do I migrate from an earlier (4.x) version of Salience?
- We have created a set of Migration Guides to help you upgrade.