Logistic Regression

Understanding logistic regression for binary and multi-class classification

What is Logistic Regression?

A statistical method for binary and multi-class classification

Despite its name, logistic regression is a classification algorithm, not a regression algorithm. It's used to predict the probability that an instance belongs to a particular class. If the probability is greater than a threshold (typically 0.5), the model predicts that class.

Key Concepts in Logistic Regression

Logistic Function (Sigmoid): Transforms linear predictions to probabilities between 0 and 1
Decision Boundary: The threshold that separates different classes
Maximum Likelihood Estimation: The method used to find the best coefficients
Regularization: Techniques to prevent overfitting (L1 and L2)

How Logistic Regression Works

Logistic regression uses the logistic function to model the probability of a certain class:

P(y=1) = 1 / (1 + e^-(β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ))

Where:

P(y=1) is the probability that the instance belongs to class 1
β₀, β₁, ..., βₙ are the model parameters (coefficients)
x₁, x₂, ..., xₙ are the feature values

The model makes a prediction based on whether the calculated probability is above or below a threshold (typically 0.5):

If P(y=1) ≥ 0.5, predict class 1
If P(y=1) < 0.5, predict class 0

Types of Logistic Regression

Binary Logistic Regression

Used when the target variable has two possible outcomes (e.g., spam/not spam, disease/no disease). This is the most common form of logistic regression.

Multinomial Logistic Regression

Used when the target variable has three or more unordered categories (e.g., predicting types of cuisine: Italian, Chinese, Mexican).

Ordinal Logistic Regression

Used when the target variable has three or more ordered categories (e.g., movie ratings from 1 to 5 stars).

Common Use Cases

Email spam detection
Disease diagnosis
Credit risk assessment
Customer churn prediction
Marketing campaign response prediction

Advantages and Limitations

Advantages

Simple to implement and interpret
Efficient training process
Less prone to overfitting in high-dimensional spaces
Outputs well-calibrated probabilities
Works well for linearly separable classes

Limitations

Assumes linear relationship between features and log-odds
May underperform with complex non-linear relationships
Sensitive to outliers
Requires feature engineering for best results
Assumes independence of features