Machine Learning Glossary
A comprehensive reference of machine learning terms, concepts, and techniques to help you understand the field.
A mathematical function that determines the output of a neural network node. Common examples include ReLU, Sigmoid, and Tanh.
An algorithm for training neural networks that calculates gradients of the loss function with respect to the weights, propagating from output to input layers.
A technique to normalize the inputs of each layer to improve training stability and speed by reducing internal covariate shift.
The number of training examples utilized in one iteration of model training. It affects both the optimization process and the time required to train the model.
1) A parameter in machine learning models that allows the model to fit the data better. 2) A systematic error that causes a model to favor certain outcomes.
A supervised learning task where the model predicts discrete class labels or categories for input data.
An unsupervised learning technique that groups similar data points together based on certain features.
A table used to describe the performance of a classification model, showing the counts of true positives, false positives, true negatives, and false negatives.
A mathematical operation that applies a filter to an input to create a feature map that summarizes the presence of detected features in the input.
A type of neural network designed for processing grid-like data such as images, using convolutional layers to detect spatial patterns.
A resampling procedure used to evaluate machine learning models where the dataset is split into multiple subsets for training and validation.
A tree-like model that makes decisions based on feature values, splitting the data into branches at decision nodes.
A subset of machine learning using neural networks with many layers (deep neural networks) to model complex patterns in data.
A regularization technique where randomly selected neurons are ignored during training to prevent overfitting.
A form of regularization used to avoid overfitting by stopping training when performance on a validation set starts to degrade.
A technique to represent discrete variables as continuous vectors in a lower-dimensional space, often used for text or categorical data.
A technique that combines multiple machine learning models to improve performance and robustness.
One complete pass through the entire training dataset during the training of a machine learning model.
Artificial intelligence systems whose actions can be easily understood by humans. It contrasts with the 'black box' concept in machine learning.
The harmonic mean of precision and recall, providing a single metric that balances both concerns. Particularly useful for imbalanced datasets.
An individual measurable property or characteristic of a phenomenon being observed, used as input to a machine learning model.
The process of selecting, transforming, or creating features from raw data to improve model performance.
A measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.
An optimization algorithm that iteratively adjusts parameters to minimize a loss function by moving in the direction of steepest descent.
A parameter whose value is set before the learning process begins, as opposed to parameters that are learned during training.
The process of finding the optimal hyperparameters for a machine learning algorithm to maximize its performance on a specific task.
A measure used in decision trees that quantifies how much 'information' a feature gives us about the class. It's based on the concept of entropy from information theory.
An unsupervised learning algorithm that partitions data into K clusters, where each data point belongs to the cluster with the nearest mean.
A regularization technique that adds the absolute value of the magnitude of coefficients as a penalty term to the loss function, promoting sparsity in the model.
A regularization technique that adds the squared magnitude of coefficients as a penalty term to the loss function, preventing any single feature from having too much influence.
A hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.
A function that measures the difference between the model's predictions and the actual target values, used to guide the optimization process.
A type of recurrent neural network architecture designed to handle the vanishing gradient problem and better capture long-term dependencies in sequential data.
A measure of errors between paired observations expressing the same phenomenon, calculated as the average of the absolute differences between prediction and actual observation.
A measure of the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.
A computational model inspired by the human brain, consisting of interconnected nodes (neurons) organized in layers that process information.
The process of scaling features to a standard range, typically between 0 and 1 or -1 and 1, to improve model training and performance.
A process by which categorical variables are converted into a form that could be provided to machine learning algorithms to improve predictions.
A modeling error where a model learns the training data too well, including noise and outliers, resulting in poor generalization to new data.
A downsampling operation used in CNNs that reduces the dimensionality of feature maps, retaining the most important information while reducing computation.
A metric that measures the proportion of true positive predictions among all positive predictions made by a model.
A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.
An ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification or mean prediction for regression.
A metric that measures the proportion of true positive predictions among all actual positive instances in the data.
A type of neural network designed for sequential data, with connections that form cycles to maintain memory of previous inputs.
A supervised learning task where the model predicts continuous numerical values rather than discrete categories.
Techniques used to prevent overfitting by adding a penalty term to the loss function or modifying the model architecture.
A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.
The square root of the mean of the squared differences between predicted values and observed values, providing an error measure in the same units as the target variable.
A mechanism used in transformer models that allows the model to weigh the importance of different words in a sequence when making predictions, regardless of their position.
A learning approach that combines a small amount of labeled data with a large amount of unlabeled data during training.
A function that converts a vector of real numbers into a probability distribution. It's often used as the activation function in the output layer of neural networks for multi-class classification.
A variant of gradient descent that uses a single training example or a small batch to compute the gradient and update the parameters, making it more efficient for large datasets.
A type of machine learning where the model is trained on labeled data, learning to map inputs to known outputs.
A supervised learning model that finds the optimal hyperplane to separate different classes in the feature space.
The process of breaking down text into smaller units called tokens, which can be words, characters, or subwords, for processing in NLP tasks.
A technique where a model developed for one task is reused as the starting point for a model on a second task, often saving training time and improving performance.
A deep learning model architecture that relies entirely on self-attention mechanisms without using recurrent neural networks, primarily used for NLP tasks.
A modeling error where a model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and new data.
A type of machine learning where the model is trained on unlabeled data, discovering patterns and relationships without explicit guidance.
A subset of the data used to tune hyperparameters and evaluate model performance during training, separate from the test set.
A difficulty found in training neural networks with gradient-based methods and backpropagation, where the gradient becomes extremely small, effectively preventing the weights from changing value.
A measure of how much the predictions of a model change when trained on different subsets of the training data.
A parameter in a neural network or other machine learning model that determines the strength of connection between nodes or the importance of features.