Introduction

ad_trait is a powerful, flexible, and easy-to-use Automatic Differentiation (AD) library for Rust.

What is Automatic Differentiation?

Automatic Differentiation is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Unlike symbolic differentiation, which produces a mathematical expression for the derivative, or finite differencing, which estimates derivatives from function evaluations, AD computes derivatives exactly (up to machine precision) by applying the chain rule to the program's elementary operations.

Why ad_trait?

Unified Interface: Use the same code for forward-mode, reverse-mode, and finite differencing.
Integration: Seamlessly works with nalgebra and ndarray.
Flexibility: Define your own differentiable functions using a simple trait.
Performance: High-performance implementations including multi-tangent forward AD and SIMD acceleration.

Goals

The primary goal of ad_trait is to make sophisticated automatic differentiation accessible to the Rust ecosystem with a focus on robotics, optimization, and machine learning.

Getting Started

Adding ad_trait to your project is straightforward.

Installation

Add the following to your Cargo.toml:

[dependencies]
ad_trait = "0.2.0"

Basic Usage

The core workflow of ad_trait involves three steps:

Implement DifferentiableFunctionTrait: Define your function.
Implement Reparameterize: Allow your function to work with different AD types.
Use FunctionEngine: Wrap your function with a differentiation method.

A Simple Example

Here's how to compute the derivative of $f(x) = x^2$:

use ad_trait::{AD, DifferentiableFunctionTrait, Reparameterize, FunctionEngine, ForwardAD};

#[derive(Clone)]
struct Square;

impl<T: AD> DifferentiableFunctionTrait<T> for Square {
    const NAME: &'static str = "Square";
    fn call(&self, inputs: &[T], _freeze: bool) -> Vec<T> {
        vec![inputs[0] * inputs[0]]
    }
    fn num_inputs(&self) -> usize { 1 }
    fn num_outputs(&self) -> usize { 1 }
}

impl Reparameterize for Square {
    type SelfType<T2: AD> = Square;
}

fn main() {
    let func = Square;
    let engine = FunctionEngine::new(func.clone(), func, ForwardAD::new());
    
    let x = 3.0;
    let (val, grad) = engine.derivative(&[x]);
    
    println!("f(3) = {}", val[0]); // Output: 9
    println!("f'(3) = {}", grad[(0, 0)]); // Output: 6
}

Core Concepts

ad_trait is built around a few central abstractions that make it highly extensible.

The AD Type System

The cornerstone of the library is the AD trait. Any type that implements AD can be used in a differentiable computation. The library provides several built-in implementations:

f64 and f32: For standard computations without derivative tracking.
adfn<N>: For forward-mode AD with $N$ tangents.
adr: For reverse-mode AD using a global computation graph.
f64xn<N>: For SIMD-accelerated numerical computations.

Trait Hierarchy

AD: The base numerical trait for differentiation.
DifferentiableFunctionTrait<T>: Defines how a function is evaluated for a given AD type T.
Reparameterize: Bridges the gap between different AD types, allowing a function to be automatically adapted for different differentiation modes.
DerivativeMethodTrait: Defines how a derivative is calculated (e.g., Forward, Reverse).

The Function Engine

The FunctionEngine is the primary interface for users. It wraps a differentiable function and a derivative method, providing a simple way to call the function and get its Jacobian.

The AD Trait

The AD trait is the fundamental building block of ad_trait. It defines the arithmetic and mathematical operations required for automatic differentiation.

Why use a Trait?

By using a trait instead of a concrete type, ad_trait allows you to write generic algorithms that can be used for:

Standard evaluation (using f64).
First-order derivatives (using adfn<1>).
Gradients for many inputs (using adr).
Accelerated vector math (using f64xn).

Key Methods

constant(f64) -> Self: Creates a new AD value from a constant.
to_constant(&self) -> f64: Retrieves the underlying value.
ad_num_mode(): Returns the current mode (Float, ForwardAD, etc.).
to_other_ad_type<T2: AD>(&self) -> T2: Converts to a different AD type.

Numerical Operations

AD requires many standard numerical traits, including:

RealField and ComplexField from simba.
num_traits::Signed.
Standard operator overloads (Add, Mul, etc.).

This ensures that any type implementing AD behaves like a sophisticated number.

Automatic Differentiation Modes

ad_trait supports three primary modes of differentiation, each with its own strengths and weaknesses.

Summary Table

Mode	Type	Speed (Few Inputs)	Speed (Many Inputs)	Precision
Forward-Mode	Exact	Very Fast	Slow	Exact
Reverse-Mode	Exact	Moderate	Very Fast	Exact
Finite Differencing	Approx	Fast	Very Slow	Low

In the following chapters, we will explore each of these modes in detail.

Forward-Mode AD

Forward-mode automatic differentiation propagates derivatives along with the function evaluation. In ad_trait, this is implemented using the adfn<N> type.

How it Works

A forward-mode AD variable can be thought of as a pair $(v, \dot{v})$, where $v$ is the current value and $\dot{v}$ is its tangent (the derivative with respect to some input). Every operation on these variables updates both the value and the tangent using the rules of calculus.

Single-Tangent Forward AD

When using ForwardAD, the library calculates the derivative with respect to one input at a time. To compute a full Jacobian for $M$ inputs, the function is evaluated $M$ times.

Multi-Tangent Forward AD

One of the unique features of ad_trait is its support for multiple tangents. By using adfn<N>, you can compute up to $N$ columns of the Jacobian in a single forward pass. This is extremely efficient for functions where most work is shared across different input variables.

Usage Example

#![allow(unused)]
fn main() {
use ad_trait::{ForwardAD, adfn};

// evaluate with ForwardAD
// this will compute derivatives by calling the function once 
// per input dimension.
let engine = FunctionEngine::new(func.clone(), func, ForwardAD::new());
}

Reverse-Mode AD

Reverse-mode automatic differentiation is the most efficient way to compute gradients of functions with a very large number of inputs (e.g., neural networks). In ad_trait, this is implemented using the adr type.

How it Works

Unlike forward-mode, reverse-mode AD works in two phases:

Forward Pass: The function is evaluated, and all operations are recorded in a Global Computation Graph.
Backward Pass: The library traverses the graph in reverse, applying the chain rule to compute the gradient with respect to every input variable.

The Global Computation Graph

Because adr relies on a global graph, certain care must be taken:

The graph must be reset between independent differentiation calls (handled automatically by FunctionEngine).
In multi-threaded environments, access to the graph must be synchronized.

Usage Example

#![allow(unused)]
fn main() {
use ad_trait::ReverseAD;

// evaluate with ReverseAD
// this is best for functions with many inputs and few outputs.
let engine = FunctionEngine::new(func.clone(), func, ReverseAD::new());
}

Finite Differencing

Finite Differencing is a classical numerical method for estimating derivatives. It is provided in ad_trait as a baseline and for functions where AD types might be impractical.

How it Works

The derivative is approximated using the formula: $$f'(x) \approx \frac{f(x + h) - f(x)}{h}$$ where $h$ is a very small value.

Accuracy vs. Precision

Finite differencing is an approximation and is subject to both truncation error (making $h$ too large) and round-off error (making $h$ too small). It is generally much less precise than true automatic differentiation.

Usage Example

#![allow(unused)]
fn main() {
use ad_trait::FiniteDifferencing;

// evaluate with FiniteDifferencing
let engine = FunctionEngine::new(func.clone(), func, FiniteDifferencing::new());
}

Advanced Topics

Explore the high-performance features and deep technical details of ad_trait.

Multi-Tangents: Scaling forward-mode AD.
SIMD Acceleration: Leveraging hardware for faster differentiation.
Matrix Operations: Efficiently handling linear algebra.

Multi-Tangents

Multi-tangent forward AD is a powerful optimization technique that allows ad_trait to compute multiple partial derivatives in a single pass of the function.

The Problem

Standard forward AD computes one column of the Jacobian per pass. If a function has 100 inputs, you need 100 passes. For complex functions, this overhead can be significant.

The Solution: `adfn<N>`

By setting $N > 1$, the tangent component of the AD type becomes a vector of length $N$. Each multiplication or operation now operates on this vector simultaneously.

When to Use

When the input dimension is moderately large (e.g., 2-32).
When the function evaluation is computationally expensive relative to the number of inputs.
When you want to minimize the number of times the function logic is executed.

Integration with SIMD

Multi-tangents pair perfectly with SIMD. By propagating multiple tangents, the underlying hardware can often execute these operations in parallel, providing a "free" speedup.

SIMD Acceleration

ad_trait provides first-class support for Single Instruction, Multiple Data (SIMD) acceleration through the f64xn<N> type.

What is SIMD?

SIMD allows a single CPU instruction to operate on multiple pieces of data (usually vectors) at once. This can lead to 4x-8x speedups for arithmetic operations on modern processors.

Using `f64xn`

The f64xn<N> type implements the AD trait (in SIMDNum mode). It stores $N$ floats and performs element-wise operations using SIMD intrinsics (where available).

Requirements

Currently, f64xn requires the nightly version of Rust to access the portable_simd feature.

Example

#![allow(unused)]
fn main() {
// Requires nightly and #![feature(portable_simd)]
use ad_trait::simd::f64xn;

let a = f64xn::<4>::new([1.0, 2.0, 3.0, 4.0]);
let b = f64xn::<4>::new([5.0, 6.0, 7.0, 8.0]);
let c = a + b; // Computed using a single CPU instruction if possible
}

Matrix Operations

ad_trait is designed to handle linear algebra efficiently. The AD trait includes requirements for matrix-scalar multiplication, which is the foundation for differentiating through linear systems.

Scalar-Matrix Multiplication

Types implementing AD must provide:

mul_by_nalgebra_matrix: Multiplies an AD scalar by a nalgebra matrix.
mul_by_ndarray_matrix_ref: Multiplies an AD scalar by an ndarray array.

Why this is necessary

By providing these specialized methods at the trait level, ad_trait can ensure that matrix operations are handled correctly for each AD mode. For example, in ForwardAD, the matrix multiplication also propagates the tangent through every element of the matrix.

Performance Considerations

When performing large matrix multiplications, it is often more efficient to use a differentiation mode that minimizes the number of passes (like ReverseAD or ForwardADMulti).

Integration

ad_trait doesn't exist in a vacuum. It is designed to play well with the rest of the Rust numerical ecosystem.

nalgebra: The standard for linear algebra in Rust.
ndarray: Flexible multi-dimensional arrays.

In this section, we'll see how to use ad_trait types within these libraries.

Integration with nalgebra

ad_trait types are fully compatible with nalgebra's generic matrix types.

Example: AD Matrix

You can create a nalgebra matrix using any AD type:

#![allow(unused)]
fn main() {
use nalgebra::SMatrix;
use ad_trait::adfn;

// A 2x2 matrix of Forward-Mode AD variables
let m = SMatrix::<adfn<1>, 2, 2>::zeros();
}

Differentiating through nalgebra

Because adfn and adr implement the relevant traits required by nalgebra (like RealField and ComplexField), you can use standard nalgebra functions (determinant, inverse, multiplication) in your differentiable code.

#![allow(unused)]
fn main() {
fn my_func<T: AD>(inputs: &[T]) -> Vec<T> {
    let m = SMatrix::<T, 2, 2>::from_vec(inputs.to_vec());
    let inv = m.try_inverse().unwrap();
    vec![inv[(0, 0)]]
}
}

Integration with ndarray

ad_trait also supports the ndarray crate, which is commonly used for data science and machine learning.

Example: AD Array

#![allow(unused)]
fn main() {
use ndarray::Array2;
use ad_trait::adfn;

let a = Array2::<adfn<1>>::zeros((10, 10));
}

Scalar Operations

The AD trait includes mul_by_ndarray_matrix_ref, allowing you to perform scalar-array multiplication efficiently across different AD modes.

Generic Algorithms

Similar to nalgebra, you can write generic algorithms using ndarray that work with any AD type. This is particularly useful for implementing complex mathematical models that require gradients for optimization.

Examples

The ad_trait repository contains several examples that demonstrate the library in action.

Built-in Examples

Check the examples/ directory in the crate for standalone programs:

test.rs: A general demonstration of forward and reverse AD.

Regression Tests

The tests/regression_tests.rs file contains many examples of differentiating complex functions, including:

Multi-variate polynomials.
Matrix-vector multiplication.
Jacobian calculations for multiple outputs.

These tests serve as an excellent reference for how to structure your own differentiable functions.

ad_trait: Automatic Differentiation for Rust