ScholarGate
Assistent

Text Classification and Sentiment Analysis

Assigning categories to texts — topics, languages, spam, or sentiment — using probabilistic and neural classifiers, the most widely deployed family of NLP techniques.

Troba un tema amb PaperMindAviatFind papers & topics
Tools & resources
Baixa les diapositives
Learn & explore
VídeoAviat

Definition

Text classification is the supervised assignment of one or more predefined category labels to a span of text, with sentiment analysis as a leading application.

Scope

Covers supervised classification of documents and shorter texts: feature representations such as bag-of-words and embeddings, classic models like naive Bayes and logistic regression, neural classifiers, and the prominent application of sentiment and opinion analysis. It addresses evaluation, class imbalance, and feature design. Representation learning itself is covered in a sibling topic.

Core questions

  • How is text represented as features for a classifier?
  • When are naive Bayes, logistic regression, or neural models appropriate?
  • How does sentiment analysis cope with negation, sarcasm, and context?
  • How is classifier performance measured fairly under class imbalance?

Key concepts

  • bag-of-words
  • naive Bayes
  • logistic regression
  • feature engineering
  • sentiment analysis
  • subjectivity detection
  • class imbalance
  • precision and recall

Key theories

Bag-of-words classification
Representing a document as the counts of its words and classifying with models such as naive Bayes or logistic regression, a simple yet strong baseline.
Subjectivity-aware sentiment analysis
Improving sentiment classification by first separating subjective from objective content, as in Pang and Lee's minimum-cut approach.

History

Text classification was among the first NLP tasks to go fully statistical, with naive Bayes and later support-vector machines dominating in the 1990s and 2000s. Sentiment analysis, popularized by Pang and Lee in the early 2000s, became a major subfield; neural classifiers and pretrained models later raised accuracy across the board.

Debates

Simple features versus deep representations
Strong bag-of-words baselines often rival neural models on short, topical tasks, prompting debate over when the added complexity of deep representations is justified.

Key figures

  • Bo Pang
  • Lillian Lee
  • Christopher Manning

Related topics

Seminal works

  • pang2004
  • manning1999

Frequently asked questions

Why is sentiment analysis harder than topic classification?
Sentiment depends on subtle cues like negation, comparison, and sarcasm, and the same words can express opposite polarities in different contexts, so surface word counts alone are often insufficient.

Methods for this concept

Related concepts