Classification Models

Classification models are supervised learning algorithms that predict discrete categories or labels. They're used when the output variable is categorical, such as "spam" or "not spam" in email filtering.

What is Classification?

Classification is a supervised learning approach where the algorithm learns from labeled training data to predict discrete class labels for new, unseen instances. The goal is to identify which category or class an observation belongs to based on a training set of data containing observations with known category memberships.

Unlike regression models that predict continuous values, classification models predict discrete values or categories. These categories can be binary (two classes) or multi-class (more than two classes).

Key Characteristics

Predicts discrete class labels or categories
Requires labeled training data
Can handle binary or multi-class problems
Evaluated using metrics like accuracy, precision, recall, and F1-score
Decision boundaries separate different classes in the feature space

Classification visualization showing decision boundaries between classes

Common Classification Algorithms

Logistic Regression

Understand the mathematics of logistic regression

A statistical model that uses a logistic function to model a binary dependent variable, commonly used for binary classification problems.

Explore Logistic Regression

Decision Trees

Understand decision trees and their applications

A versatile machine learning algorithm that creates a flowchart-like structure for making decisions based on feature values.

Explore Decision Trees

Support Vector Machines

Explore the mathematics behind SVMs

A powerful classification algorithm that finds the optimal hyperplane to separate different classes with maximum margin.

Explore Support Vector Machines

Random Forests

Learn how ensemble methods improve performance

An ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode of the classes of the individual trees.

Explore Random Forests

Common Applications

Email Spam Detection

Classifying emails as spam or not spam based on content, sender information, and other features.

Medical Diagnosis

Predicting whether a patient has a particular disease based on symptoms and test results.

Image Recognition

Identifying objects, people, or scenes in images by classifying them into predefined categories.

Evaluation Metrics

Classification models are evaluated using different metrics than regression models. Common evaluation metrics include:

Accuracy: The proportion of correct predictions among the total number of predictions.
Precision: The proportion of true positive predictions among all positive predictions.
Recall: The proportion of true positive predictions among all actual positive instances.
F1-Score: The harmonic mean of precision and recall, providing a balance between the two.
ROC Curve: A graphical plot that illustrates the diagnostic ability of a binary classifier system.

Learn more in the Glossary