Introduction
ad_trait is a powerful, flexible, and easy-to-use Automatic Differentiation (AD) library for Rust.
What is Automatic Differentiation?
Automatic Differentiation is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Unlike symbolic differentiation, which produces a mathematical expression for the derivative, or finite differencing, which estimates derivatives from function evaluations, AD computes derivatives exactly (up to machine precision) by applying the chain rule to the program's elementary operations.
Why ad_trait?
- Unified Interface: Use the same code for forward-mode, reverse-mode, and finite differencing.
- Integration: Seamlessly works with
nalgebraandndarray. - Flexibility: Define your own differentiable functions using a simple trait.
- Performance: High-performance implementations including multi-tangent forward AD and SIMD acceleration.
Goals
The primary goal of ad_trait is to make sophisticated automatic differentiation accessible to the Rust ecosystem with a focus on robotics, optimization, and machine learning.
Getting Started
Adding ad_trait to your project is straightforward.
Installation
Add the following to your Cargo.toml:
[dependencies]
ad_trait = "0.2.0"
Basic Usage
The core workflow of ad_trait involves three steps:
- Implement
DifferentiableFunctionTrait: Define your function. - Implement
Reparameterize: Allow your function to work with different AD types. - Use
FunctionEngine: Wrap your function with a differentiation method.
A Simple Example
Here's how to compute the derivative of $f(x) = x^2$:
use ad_trait::{AD, DifferentiableFunctionTrait, Reparameterize, FunctionEngine, ForwardAD}; #[derive(Clone)] struct Square; impl<T: AD> DifferentiableFunctionTrait<T> for Square { const NAME: &'static str = "Square"; fn call(&self, inputs: &[T], _freeze: bool) -> Vec<T> { vec![inputs[0] * inputs[0]] } fn num_inputs(&self) -> usize { 1 } fn num_outputs(&self) -> usize { 1 } } impl Reparameterize for Square { type SelfType<T2: AD> = Square; } fn main() { let func = Square; let engine = FunctionEngine::new(func.clone(), func, ForwardAD::new()); let x = 3.0; let (val, grad) = engine.derivative(&[x]); println!("f(3) = {}", val[0]); // Output: 9 println!("f'(3) = {}", grad[(0, 0)]); // Output: 6 }
Core Concepts
ad_trait is built around a few central abstractions that make it highly extensible.
The AD Type System
The cornerstone of the library is the AD trait. Any type that implements AD can be used in a differentiable computation. The library provides several built-in implementations:
f64andf32: For standard computations without derivative tracking.adfn<N>: For forward-mode AD with $N$ tangents.adr: For reverse-mode AD using a global computation graph.f64xn<N>: For SIMD-accelerated numerical computations.
Trait Hierarchy
AD: The base numerical trait for differentiation.DifferentiableFunctionTrait<T>: Defines how a function is evaluated for a given AD typeT.Reparameterize: Bridges the gap between different AD types, allowing a function to be automatically adapted for different differentiation modes.DerivativeMethodTrait: Defines how a derivative is calculated (e.g., Forward, Reverse).
The Function Engine
The FunctionEngine is the primary interface for users. It wraps a differentiable function and a derivative method, providing a simple way to call the function and get its Jacobian.
The AD Trait
The AD trait is the fundamental building block of ad_trait. It defines the arithmetic and mathematical operations required for automatic differentiation.
Why use a Trait?
By using a trait instead of a concrete type, ad_trait allows you to write generic algorithms that can be used for:
- Standard evaluation (using
f64). - First-order derivatives (using
adfn<1>). - Gradients for many inputs (using
adr). - Accelerated vector math (using
f64xn).
Key Methods
constant(f64) -> Self: Creates a new AD value from a constant.to_constant(&self) -> f64: Retrieves the underlying value.ad_num_mode(): Returns the current mode (Float, ForwardAD, etc.).to_other_ad_type<T2: AD>(&self) -> T2: Converts to a different AD type.
Numerical Operations
AD requires many standard numerical traits, including:
RealFieldandComplexFieldfromsimba.num_traits::Signed.- Standard operator overloads (
Add,Mul, etc.).
This ensures that any type implementing AD behaves like a sophisticated number.
Automatic Differentiation Modes
ad_trait supports three primary modes of differentiation, each with its own strengths and weaknesses.
Summary Table
| Mode | Type | Speed (Few Inputs) | Speed (Many Inputs) | Precision |
|---|---|---|---|---|
| Forward-Mode | Exact | Very Fast | Slow | Exact |
| Reverse-Mode | Exact | Moderate | Very Fast | Exact |
| Finite Differencing | Approx | Fast | Very Slow | Low |
In the following chapters, we will explore each of these modes in detail.
Forward-Mode AD
Forward-mode automatic differentiation propagates derivatives along with the function evaluation. In ad_trait, this is implemented using the adfn<N> type.
How it Works
A forward-mode AD variable can be thought of as a pair $(v, \dot{v})$, where $v$ is the current value and $\dot{v}$ is its tangent (the derivative with respect to some input). Every operation on these variables updates both the value and the tangent using the rules of calculus.
Single-Tangent Forward AD
When using ForwardAD, the library calculates the derivative with respect to one input at a time. To compute a full Jacobian for $M$ inputs, the function is evaluated $M$ times.
Multi-Tangent Forward AD
One of the unique features of ad_trait is its support for multiple tangents. By using adfn<N>, you can compute up to $N$ columns of the Jacobian in a single forward pass. This is extremely efficient for functions where most work is shared across different input variables.
Usage Example
#![allow(unused)] fn main() { use ad_trait::{ForwardAD, adfn}; // evaluate with ForwardAD // this will compute derivatives by calling the function once // per input dimension. let engine = FunctionEngine::new(func.clone(), func, ForwardAD::new()); }
Reverse-Mode AD
Reverse-mode automatic differentiation is the most efficient way to compute gradients of functions with a very large number of inputs (e.g., neural networks). In ad_trait, this is implemented using the adr type.
How it Works
Unlike forward-mode, reverse-mode AD works in two phases:
- Forward Pass: The function is evaluated, and all operations are recorded in a Global Computation Graph.
- Backward Pass: The library traverses the graph in reverse, applying the chain rule to compute the gradient with respect to every input variable.
The Global Computation Graph
Because adr relies on a global graph, certain care must be taken:
- The graph must be reset between independent differentiation calls (handled automatically by
FunctionEngine). - In multi-threaded environments, access to the graph must be synchronized.
Usage Example
#![allow(unused)] fn main() { use ad_trait::ReverseAD; // evaluate with ReverseAD // this is best for functions with many inputs and few outputs. let engine = FunctionEngine::new(func.clone(), func, ReverseAD::new()); }
Finite Differencing
Finite Differencing is a classical numerical method for estimating derivatives. It is provided in ad_trait as a baseline and for functions where AD types might be impractical.
How it Works
The derivative is approximated using the formula: $$f'(x) \approx \frac{f(x + h) - f(x)}{h}$$ where $h$ is a very small value.
Accuracy vs. Precision
Finite differencing is an approximation and is subject to both truncation error (making $h$ too large) and round-off error (making $h$ too small). It is generally much less precise than true automatic differentiation.
Usage Example
#![allow(unused)] fn main() { use ad_trait::FiniteDifferencing; // evaluate with FiniteDifferencing let engine = FunctionEngine::new(func.clone(), func, FiniteDifferencing::new()); }
Advanced Topics
Explore the high-performance features and deep technical details of ad_trait.
- Multi-Tangents: Scaling forward-mode AD.
- SIMD Acceleration: Leveraging hardware for faster differentiation.
- Matrix Operations: Efficiently handling linear algebra.
Multi-Tangents
Multi-tangent forward AD is a powerful optimization technique that allows ad_trait to compute multiple partial derivatives in a single pass of the function.
The Problem
Standard forward AD computes one column of the Jacobian per pass. If a function has 100 inputs, you need 100 passes. For complex functions, this overhead can be significant.
The Solution: adfn<N>
By setting $N > 1$, the tangent component of the AD type becomes a vector of length $N$. Each multiplication or operation now operates on this vector simultaneously.
When to Use
- When the input dimension is moderately large (e.g., 2-32).
- When the function evaluation is computationally expensive relative to the number of inputs.
- When you want to minimize the number of times the function logic is executed.
Integration with SIMD
Multi-tangents pair perfectly with SIMD. By propagating multiple tangents, the underlying hardware can often execute these operations in parallel, providing a "free" speedup.
SIMD Acceleration
ad_trait provides first-class support for Single Instruction, Multiple Data (SIMD) acceleration through the f64xn<N> type.
What is SIMD?
SIMD allows a single CPU instruction to operate on multiple pieces of data (usually vectors) at once. This can lead to 4x-8x speedups for arithmetic operations on modern processors.
Using f64xn
The f64xn<N> type implements the AD trait (in SIMDNum mode). It stores $N$ floats and performs element-wise operations using SIMD intrinsics (where available).
Requirements
Currently, f64xn requires the nightly version of Rust to access the portable_simd feature.
Example
#![allow(unused)] fn main() { // Requires nightly and #![feature(portable_simd)] use ad_trait::simd::f64xn; let a = f64xn::<4>::new([1.0, 2.0, 3.0, 4.0]); let b = f64xn::<4>::new([5.0, 6.0, 7.0, 8.0]); let c = a + b; // Computed using a single CPU instruction if possible }
Matrix Operations
ad_trait is designed to handle linear algebra efficiently. The AD trait includes requirements for matrix-scalar multiplication, which is the foundation for differentiating through linear systems.
Scalar-Matrix Multiplication
Types implementing AD must provide:
mul_by_nalgebra_matrix: Multiplies an AD scalar by analgebramatrix.mul_by_ndarray_matrix_ref: Multiplies an AD scalar by anndarrayarray.
Why this is necessary
By providing these specialized methods at the trait level, ad_trait can ensure that matrix operations are handled correctly for each AD mode. For example, in ForwardAD, the matrix multiplication also propagates the tangent through every element of the matrix.
Performance Considerations
When performing large matrix multiplications, it is often more efficient to use a differentiation mode that minimizes the number of passes (like ReverseAD or ForwardADMulti).
Integration
ad_trait doesn't exist in a vacuum. It is designed to play well with the rest of the Rust numerical ecosystem.
- nalgebra: The standard for linear algebra in Rust.
- ndarray: Flexible multi-dimensional arrays.
In this section, we'll see how to use ad_trait types within these libraries.
Integration with nalgebra
ad_trait types are fully compatible with nalgebra's generic matrix types.
Example: AD Matrix
You can create a nalgebra matrix using any AD type:
#![allow(unused)] fn main() { use nalgebra::SMatrix; use ad_trait::adfn; // A 2x2 matrix of Forward-Mode AD variables let m = SMatrix::<adfn<1>, 2, 2>::zeros(); }
Differentiating through nalgebra
Because adfn and adr implement the relevant traits required by nalgebra (like RealField and ComplexField), you can use standard nalgebra functions (determinant, inverse, multiplication) in your differentiable code.
#![allow(unused)] fn main() { fn my_func<T: AD>(inputs: &[T]) -> Vec<T> { let m = SMatrix::<T, 2, 2>::from_vec(inputs.to_vec()); let inv = m.try_inverse().unwrap(); vec![inv[(0, 0)]] } }
Integration with ndarray
ad_trait also supports the ndarray crate, which is commonly used for data science and machine learning.
Example: AD Array
#![allow(unused)] fn main() { use ndarray::Array2; use ad_trait::adfn; let a = Array2::<adfn<1>>::zeros((10, 10)); }
Scalar Operations
The AD trait includes mul_by_ndarray_matrix_ref, allowing you to perform scalar-array multiplication efficiently across different AD modes.
Generic Algorithms
Similar to nalgebra, you can write generic algorithms using ndarray that work with any AD type. This is particularly useful for implementing complex mathematical models that require gradients for optimization.
Examples
The ad_trait repository contains several examples that demonstrate the library in action.
Built-in Examples
Check the examples/ directory in the crate for standalone programs:
test.rs: A general demonstration of forward and reverse AD.
Regression Tests
The tests/regression_tests.rs file contains many examples of differentiating complex functions, including:
- Multi-variate polynomials.
- Matrix-vector multiplication.
- Jacobian calculations for multiple outputs.
These tests serve as an excellent reference for how to structure your own differentiable functions.