Wednesday, February 2, 2011

Sentiment Classification and Opinion Lexicons


Lexicons are a big part of my current research in opinion mining. Aside from the potential of helping supervised learning methods, they can be applied to unsupervised techniques - an appealing idea for research whose goal is domain independence. An opinion lexicon is a database that associates terms with opinion information - normally in the form of a numeric score indicating a term's positive or negative bias.

My dissertation was an investigation on how lexicons perform on sentiment classification of film reviews - this work was later expanded and incorporated into a chapter on the book "Knowledge Discovery Practices and Applications in Data Mining - Trends and New Domains".
A shorter version of this research was presented in Dublin's IT&T 2009 and available here.

The lexicon used here was SentiWordNet. Built from WordNet, SentiWordNet leverages WordNet's semantic relationships like synonyms and antonyms, and term glosses to expand a set of seeded words into a much larger lexicon. It can be tried online here. (also see Esuli and Sebastiani's SentiWordNet paper).

Using SeniWordNet for sentiment classification involves scanning a document for relevant terms and matching available information from the lexicon according to part of speech. There are some interesting NLP challenges involved here: we run the text via a part of speech tagger first to obtain details on whether terms are adjective, verb, etc. Then negation detection is performed to identify parts of text affected by a negating statement (ex: "not good" as opposed to "good"). Then, the document is scored based on terms found and whether it is negated. The overall approach is given below.


Resources

3 comments:

  1. I'm in the process of preparing my PhD studies in the area of opninion mining, however i don't have a specific topic in my mind, which area should I focus on.. opinion mining is a big topic..Therefore, Do have any ideas regarding the opinion mining area for my PhD studies?

    ReplyDelete
  2. It seems that the page were you could get SentiWordNet has been hacked:( Is there some other way to contact the authors to get it? I would be very grateful for some advice...

    ReplyDelete
  3. The main page for SentiWordNet is here:
    http://sentiwordnet.isti.cnr.it/

    There is also an online interface to SWN:
    http://sentiwordnet.isti.cnr.it/search.php?q=good

    ReplyDelete