Machine Learning Glossary

A comprehensive reference of machine learning terms, concepts, and techniques to help you understand the field.

Activation Function

Neural Networks

A mathematical function that determines the output of a neural network node. Common examples include ReLU, Sigmoid, and Tanh.

Backpropagation

Neural Networks

An algorithm for training neural networks that calculates gradients of the loss function with respect to the weights, propagating from output to input layers.

Batch Normalization

Neural Networks

A technique to normalize the inputs of each layer to improve training stability and speed by reducing internal covariate shift.

Batch Size

Training

The number of training examples utilized in one iteration of model training. It affects both the optimization process and the time required to train the model.

Bias

General

1) A parameter in machine learning models that allows the model to fit the data better. 2) A systematic error that causes a model to favor certain outcomes.

Classification

Tasks

A supervised learning task where the model predicts discrete class labels or categories for input data.

Clustering

Tasks

An unsupervised learning technique that groups similar data points together based on certain features.

Confusion Matrix

Evaluation

A table used to describe the performance of a classification model, showing the counts of true positives, false positives, true negatives, and false negatives.

Convolution

Neural Networks

A mathematical operation that applies a filter to an input to create a feature map that summarizes the presence of detected features in the input.

Convolutional Neural Network (CNN)

Neural Networks

A type of neural network designed for processing grid-like data such as images, using convolutional layers to detect spatial patterns.

Cross-Validation

Evaluation

A resampling procedure used to evaluate machine learning models where the dataset is split into multiple subsets for training and validation.

Decision Tree

Models

A tree-like model that makes decisions based on feature values, splitting the data into branches at decision nodes.

Deep Learning

General

A subset of machine learning using neural networks with many layers (deep neural networks) to model complex patterns in data.

Dropout

Neural Networks

A regularization technique where randomly selected neurons are ignored during training to prevent overfitting.

Early Stopping

Training

A form of regularization used to avoid overfitting by stopping training when performance on a validation set starts to degrade.

Embedding

Feature Engineering

A technique to represent discrete variables as continuous vectors in a lower-dimensional space, often used for text or categorical data.

Ensemble Learning

Models

A technique that combines multiple machine learning models to improve performance and robustness.

Epoch

Training

One complete pass through the entire training dataset during the training of a machine learning model.

Explainable AI (XAI)

General

Artificial intelligence systems whose actions can be easily understood by humans. It contrasts with the 'black box' concept in machine learning.

F1 Score

Evaluation

The harmonic mean of precision and recall, providing a single metric that balances both concerns. Particularly useful for imbalanced datasets.

Feature

Feature Engineering

An individual measurable property or characteristic of a phenomenon being observed, used as input to a machine learning model.

Feature Engineering

The process of selecting, transforming, or creating features from raw data to improve model performance.

Gini Impurity

Models

A measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Gradient Descent

Optimization

An optimization algorithm that iteratively adjusts parameters to minimize a loss function by moving in the direction of steepest descent.

Hyperparameter

Training

A parameter whose value is set before the learning process begins, as opposed to parameters that are learned during training.

Hyperparameter Tuning

Training

The process of finding the optimal hyperparameters for a machine learning algorithm to maximize its performance on a specific task.

Information Gain

Models

A measure used in decision trees that quantifies how much 'information' a feature gives us about the class. It's based on the concept of entropy from information theory.

K-Means Clustering

Models

An unsupervised learning algorithm that partitions data into K clusters, where each data point belongs to the cluster with the nearest mean.

L1 Regularization (Lasso)

Optimization

A regularization technique that adds the absolute value of the magnitude of coefficients as a penalty term to the loss function, promoting sparsity in the model.

L2 Regularization (Ridge)

Optimization

A regularization technique that adds the squared magnitude of coefficients as a penalty term to the loss function, preventing any single feature from having too much influence.

Learning Rate

Optimization

A hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.

Loss Function

Optimization

A function that measures the difference between the model's predictions and the actual target values, used to guide the optimization process.

LSTM (Long Short-Term Memory)

Neural Networks

A type of recurrent neural network architecture designed to handle the vanishing gradient problem and better capture long-term dependencies in sequential data.

Mean Absolute Error (MAE)

Evaluation

A measure of errors between paired observations expressing the same phenomenon, calculated as the average of the absolute differences between prediction and actual observation.

Mean Squared Error (MSE)

Evaluation

A measure of the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

Neural Network

Neural Networks

A computational model inspired by the human brain, consisting of interconnected nodes (neurons) organized in layers that process information.

Normalization

Feature Engineering

The process of scaling features to a standard range, typically between 0 and 1 or -1 and 1, to improve model training and performance.

One-Hot Encoding

Feature Engineering

A process by which categorical variables are converted into a form that could be provided to machine learning algorithms to improve predictions.

Overfitting

Training

A modeling error where a model learns the training data too well, including noise and outliers, resulting in poor generalization to new data.

Pooling

Neural Networks

A downsampling operation used in CNNs that reduces the dimensionality of feature maps, retaining the most important information while reducing computation.

Precision

Evaluation

A metric that measures the proportion of true positive predictions among all positive predictions made by a model.

R-squared (Coefficient of Determination)

Evaluation

A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

Random Forest

Models

An ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification or mean prediction for regression.

Recall

Evaluation

A metric that measures the proportion of true positive predictions among all actual positive instances in the data.

Recurrent Neural Network (RNN)

Neural Networks

A type of neural network designed for sequential data, with connections that form cycles to maintain memory of previous inputs.

Regression

Tasks

A supervised learning task where the model predicts continuous numerical values rather than discrete categories.

Regularization

Training

Techniques used to prevent overfitting by adding a penalty term to the loss function or modifying the model architecture.

Reinforcement Learning

General

A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.

Root Mean Squared Error (RMSE)

Evaluation

The square root of the mean of the squared differences between predicted values and observed values, providing an error measure in the same units as the target variable.

Self-Attention

Neural Networks

A mechanism used in transformer models that allows the model to weigh the importance of different words in a sequence when making predictions, regardless of their position.

Semi-Supervised Learning

General

A learning approach that combines a small amount of labeled data with a large amount of unlabeled data during training.

Softmax Function

Neural Networks

A function that converts a vector of real numbers into a probability distribution. It's often used as the activation function in the output layer of neural networks for multi-class classification.

Stochastic Gradient Descent (SGD)

Optimization

A variant of gradient descent that uses a single training example or a small batch to compute the gradient and update the parameters, making it more efficient for large datasets.

Supervised Learning

General

A type of machine learning where the model is trained on labeled data, learning to map inputs to known outputs.

Support Vector Machine (SVM)

Models

A supervised learning model that finds the optimal hyperplane to separate different classes in the feature space.

Tokenization

Feature Engineering

The process of breaking down text into smaller units called tokens, which can be words, characters, or subwords, for processing in NLP tasks.

Transfer Learning

Training

A technique where a model developed for one task is reused as the starting point for a model on a second task, often saving training time and improving performance.

Transformer

Neural Networks

A deep learning model architecture that relies entirely on self-attention mechanisms without using recurrent neural networks, primarily used for NLP tasks.

Underfitting

Training

A modeling error where a model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and new data.

Unsupervised Learning

General

A type of machine learning where the model is trained on unlabeled data, discovering patterns and relationships without explicit guidance.

Validation Set

Evaluation

A subset of the data used to tune hyperparameters and evaluate model performance during training, separate from the test set.

Vanishing Gradient Problem

Neural Networks

A difficulty found in training neural networks with gradient-based methods and backpropagation, where the gradient becomes extremely small, effectively preventing the weights from changing value.

Variance

Evaluation

A measure of how much the predictions of a model change when trained on different subsets of the training data.

Weight

Neural Networks

A parameter in a neural network or other machine learning model that determines the strength of connection between nodes or the importance of features.