Machine Learning Glossary

A comprehensive reference of machine learning terms, concepts, and techniques to help you understand the field.

Activation Function
Neural Networks

A mathematical function that determines the output of a neural network node. Common examples include ReLU, Sigmoid, and Tanh.

Backpropagation
Neural Networks

An algorithm for training neural networks that calculates gradients of the loss function with respect to the weights, propagating from output to input layers.

Batch Normalization
Neural Networks

A technique to normalize the inputs of each layer to improve training stability and speed by reducing internal covariate shift.

Batch Size
Training

The number of training examples utilized in one iteration of model training. It affects both the optimization process and the time required to train the model.

Bias
General

1) A parameter in machine learning models that allows the model to fit the data better. 2) A systematic error that causes a model to favor certain outcomes.

Classification
Tasks

A supervised learning task where the model predicts discrete class labels or categories for input data.

Clustering
Tasks

An unsupervised learning technique that groups similar data points together based on certain features.

Confusion Matrix
Evaluation

A table used to describe the performance of a classification model, showing the counts of true positives, false positives, true negatives, and false negatives.

Convolution
Neural Networks

A mathematical operation that applies a filter to an input to create a feature map that summarizes the presence of detected features in the input.

Convolutional Neural Network (CNN)
Neural Networks

A type of neural network designed for processing grid-like data such as images, using convolutional layers to detect spatial patterns.

Cross-Validation
Evaluation

A resampling procedure used to evaluate machine learning models where the dataset is split into multiple subsets for training and validation.

Decision Tree
Models

A tree-like model that makes decisions based on feature values, splitting the data into branches at decision nodes.

Deep Learning
General

A subset of machine learning using neural networks with many layers (deep neural networks) to model complex patterns in data.

Dropout
Neural Networks

A regularization technique where randomly selected neurons are ignored during training to prevent overfitting.

Early Stopping
Training

A form of regularization used to avoid overfitting by stopping training when performance on a validation set starts to degrade.

Embedding
Feature Engineering

A technique to represent discrete variables as continuous vectors in a lower-dimensional space, often used for text or categorical data.

Ensemble Learning
Models

A technique that combines multiple machine learning models to improve performance and robustness.

Epoch
Training

One complete pass through the entire training dataset during the training of a machine learning model.

Explainable AI (XAI)
General

Artificial intelligence systems whose actions can be easily understood by humans. It contrasts with the 'black box' concept in machine learning.

F1 Score
Evaluation

The harmonic mean of precision and recall, providing a single metric that balances both concerns. Particularly useful for imbalanced datasets.

Feature
Feature Engineering

An individual measurable property or characteristic of a phenomenon being observed, used as input to a machine learning model.

Feature Engineering
Feature Engineering

The process of selecting, transforming, or creating features from raw data to improve model performance.

Gini Impurity
Models

A measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Gradient Descent
Optimization

An optimization algorithm that iteratively adjusts parameters to minimize a loss function by moving in the direction of steepest descent.

Hyperparameter
Training

A parameter whose value is set before the learning process begins, as opposed to parameters that are learned during training.

Hyperparameter Tuning
Training

The process of finding the optimal hyperparameters for a machine learning algorithm to maximize its performance on a specific task.

Information Gain
Models

A measure used in decision trees that quantifies how much 'information' a feature gives us about the class. It's based on the concept of entropy from information theory.

K-Means Clustering
Models

An unsupervised learning algorithm that partitions data into K clusters, where each data point belongs to the cluster with the nearest mean.

L1 Regularization (Lasso)
Optimization

A regularization technique that adds the absolute value of the magnitude of coefficients as a penalty term to the loss function, promoting sparsity in the model.

L2 Regularization (Ridge)
Optimization

A regularization technique that adds the squared magnitude of coefficients as a penalty term to the loss function, preventing any single feature from having too much influence.

Learning Rate
Optimization

A hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.

Loss Function
Optimization

A function that measures the difference between the model's predictions and the actual target values, used to guide the optimization process.

LSTM (Long Short-Term Memory)
Neural Networks

A type of recurrent neural network architecture designed to handle the vanishing gradient problem and better capture long-term dependencies in sequential data.

Mean Absolute Error (MAE)
Evaluation

A measure of errors between paired observations expressing the same phenomenon, calculated as the average of the absolute differences between prediction and actual observation.

Mean Squared Error (MSE)
Evaluation

A measure of the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

Neural Network
Neural Networks

A computational model inspired by the human brain, consisting of interconnected nodes (neurons) organized in layers that process information.

Normalization
Feature Engineering

The process of scaling features to a standard range, typically between 0 and 1 or -1 and 1, to improve model training and performance.

One-Hot Encoding
Feature Engineering

A process by which categorical variables are converted into a form that could be provided to machine learning algorithms to improve predictions.

Overfitting
Training

A modeling error where a model learns the training data too well, including noise and outliers, resulting in poor generalization to new data.

Pooling
Neural Networks

A downsampling operation used in CNNs that reduces the dimensionality of feature maps, retaining the most important information while reducing computation.

Precision
Evaluation

A metric that measures the proportion of true positive predictions among all positive predictions made by a model.

R-squared (Coefficient of Determination)
Evaluation

A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

Random Forest
Models

An ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification or mean prediction for regression.

Recall
Evaluation

A metric that measures the proportion of true positive predictions among all actual positive instances in the data.

Recurrent Neural Network (RNN)
Neural Networks

A type of neural network designed for sequential data, with connections that form cycles to maintain memory of previous inputs.

Regression
Tasks

A supervised learning task where the model predicts continuous numerical values rather than discrete categories.

Regularization
Training

Techniques used to prevent overfitting by adding a penalty term to the loss function or modifying the model architecture.

Reinforcement Learning
General

A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.

Root Mean Squared Error (RMSE)
Evaluation

The square root of the mean of the squared differences between predicted values and observed values, providing an error measure in the same units as the target variable.

Self-Attention
Neural Networks

A mechanism used in transformer models that allows the model to weigh the importance of different words in a sequence when making predictions, regardless of their position.

Semi-Supervised Learning
General

A learning approach that combines a small amount of labeled data with a large amount of unlabeled data during training.

Softmax Function
Neural Networks

A function that converts a vector of real numbers into a probability distribution. It's often used as the activation function in the output layer of neural networks for multi-class classification.

Stochastic Gradient Descent (SGD)
Optimization

A variant of gradient descent that uses a single training example or a small batch to compute the gradient and update the parameters, making it more efficient for large datasets.

Supervised Learning
General

A type of machine learning where the model is trained on labeled data, learning to map inputs to known outputs.

Support Vector Machine (SVM)
Models

A supervised learning model that finds the optimal hyperplane to separate different classes in the feature space.

Tokenization
Feature Engineering

The process of breaking down text into smaller units called tokens, which can be words, characters, or subwords, for processing in NLP tasks.

Transfer Learning
Training

A technique where a model developed for one task is reused as the starting point for a model on a second task, often saving training time and improving performance.

Transformer
Neural Networks

A deep learning model architecture that relies entirely on self-attention mechanisms without using recurrent neural networks, primarily used for NLP tasks.

Underfitting
Training

A modeling error where a model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and new data.

Unsupervised Learning
General

A type of machine learning where the model is trained on unlabeled data, discovering patterns and relationships without explicit guidance.

Validation Set
Evaluation

A subset of the data used to tune hyperparameters and evaluate model performance during training, separate from the test set.

Vanishing Gradient Problem
Neural Networks

A difficulty found in training neural networks with gradient-based methods and backpropagation, where the gradient becomes extremely small, effectively preventing the weights from changing value.

Variance
Evaluation

A measure of how much the predictions of a model change when trained on different subsets of the training data.

Weight
Neural Networks

A parameter in a neural network or other machine learning model that determines the strength of connection between nodes or the importance of features.