Logistic Regression
Understanding logistic regression for binary and multi-class classification
Despite its name, logistic regression is a classification algorithm, not a regression algorithm. It's used to predict the probability that an instance belongs to a particular class. If the probability is greater than a threshold (typically 0.5), the model predicts that class.
Key Concepts in Logistic Regression
- Logistic Function (Sigmoid): Transforms linear predictions to probabilities between 0 and 1
- Decision Boundary: The threshold that separates different classes
- Maximum Likelihood Estimation: The method used to find the best coefficients
- Regularization: Techniques to prevent overfitting (L1 and L2)
How Logistic Regression Works
Logistic regression uses the logistic function to model the probability of a certain class:
P(y=1) = 1 / (1 + e^-(β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ))
Where:
- P(y=1) is the probability that the instance belongs to class 1
- β₀, β₁, ..., βₙ are the model parameters (coefficients)
- x₁, x₂, ..., xₙ are the feature values
The model makes a prediction based on whether the calculated probability is above or below a threshold (typically 0.5):
- If P(y=1) ≥ 0.5, predict class 1
- If P(y=1) < 0.5, predict class 0
Binary Logistic Regression
Used when the target variable has two possible outcomes (e.g., spam/not spam, disease/no disease). This is the most common form of logistic regression.
Multinomial Logistic Regression
Used when the target variable has three or more unordered categories (e.g., predicting types of cuisine: Italian, Chinese, Mexican).
Ordinal Logistic Regression
Used when the target variable has three or more ordered categories (e.g., movie ratings from 1 to 5 stars).
Common Use Cases
- Email spam detection
- Disease diagnosis
- Credit risk assessment
- Customer churn prediction
- Marketing campaign response prediction
Advantages and Limitations
Advantages
- Simple to implement and interpret
- Efficient training process
- Less prone to overfitting in high-dimensional spaces
- Outputs well-calibrated probabilities
- Works well for linearly separable classes
Limitations
- Assumes linear relationship between features and log-odds
- May underperform with complex non-linear relationships
- Sensitive to outliers
- Requires feature engineering for best results
- Assumes independence of features