Hessian / Second-Order AD

ad_trait supports computing second-order derivatives (Hessians) by nesting automatic differentiation types. This is achieved through the HyperAD family of types.

[!NOTE] The bracketed value <N> (e.g., adfn<N>, HessianAD<N>) is a const generic that specifies the number of tangent lanes. For Hessian computation, this typically should match the number of input variables you are differentiating with respect to.

ModeTypeBest ForScaling
Forward-over-ForwardHyperAD_ADFN<N>Few inputs ($N < 20$), low memory.$O(N^2)$
Forward-over-ReverseHyperAD_ADR<N>Many inputs, few outputs (e.g., loss functions).$O(N)$

Choosing the Right Mode

Selecting the optimal mode depends primarily on the number of input variables ($N$) and the memory constraints of your application.

Forward-over-Forward (FoF)

  • When to use: Use this when you have a small number of inputs. It is the most robust mode and has the lowest memory overhead because it does not require building a computation graph.
  • Efficiency: The computational cost scales quadratically with the number of inputs ($O(N^2)$). For a function with 10 inputs, it is very fast; for 1000 inputs, it becomes prohibitively slow.
  • Implementation: Uses HessianAD<N>.

Forward-over-Reverse (FoR)

  • When to use: Use this for functions with many inputs and a single (or few) outputs, such as a neural network loss function or a complex physics simulation.
  • Efficiency: This mode is significantly more efficient for large $N$. Because the inner layer is Reverse-mode AD, a single backpropagation through a tangent value can recover an entire row of the Hessian. This allows the total cost to scale linearly with the number of inputs ($O(N)$) for scalar-valued functions.
  • Memory: Higher memory usage than FoF because it must maintain the reverse-mode computation graph.
  • Implementation: Uses HessianAD_FOR<N>.

Forward-over-Forward Hessian

This mode uses HyperAD_ADFN, which is essentially an adfn type where the primary value and tangents are themselves adfn types.

Example

use ad_trait::AD;
use ad_trait::function_engine::FunctionEngine;
use ad_trait::differentiable_function::{DifferentiableFunctionTrait, HessianAD, ToOtherADType};
use ad_trait::hyper_ad::hyper::HyperAD_ADFN;

#[derive(Clone)]
struct MyFunc;
impl<T: AD> DifferentiableFunctionTrait<T> for MyFunc {
    const NAME: &'static str = "MyFunc";
    fn call(&self, inputs: &[T], _freeze: bool) -> Vec<T> {
        let x = inputs[0];
        vec![ x * x * x ] // f(x) = x^3
    }
    fn num_inputs(&self) -> usize { 1 }
    fn num_outputs(&self) -> usize { 1 }
}

fn main() {
    let inputs = [2.0];
    let func = MyFunc;
    let engine = FunctionEngine::new(
        func.clone(), 
        func.to_other_ad_type::<HyperAD_ADFN<1>>(), 
        HessianAD::<1>::new()
    );

    let (f_res, jacobian_res, hessian_res) = engine.hessian(&inputs);
    
    println!("f(2) = {}", f_res[0]);             // 8.0
    println!("f'(2) = {}", jacobian_res[(0,0)]);  // 12.0
    println!("f''(2) = {}", hessian_res[0][(0,0)]); // 12.0
}

Forward-over-Reverse Hessian

This mode uses HyperAD_ADR, which uses adr as the inner type. This is useful when you want to combine the benefits of forward and reverse mode.

Example

use ad_trait::AD;
use ad_trait::function_engine::FunctionEngine;
use ad_trait::differentiable_function::{DifferentiableFunctionTrait, HessianAD_FOR, ToOtherADType};
use ad_trait::hyper_ad::hyper_adr::HyperAD_ADR;

#[derive(Clone)]
struct MyFunc;
impl<T: AD> DifferentiableFunctionTrait<T> for MyFunc {
    const NAME: &'static str = "MyFunc";
    fn call(&self, inputs: &[T], _freeze: bool) -> Vec<T> {
        let x = inputs[0];
        vec![ x * x * x ]
    }
    fn num_inputs(&self) -> usize { 1 }
    fn num_outputs(&self) -> usize { 1 }
}

fn main() {
    let inputs = [2.0];
    let func = MyFunc;
    let engine = FunctionEngine::new(
        func.clone(), 
        func.to_other_ad_type::<HyperAD_ADR<1>>(), 
        HessianAD_FOR::<1>::new()
    );

    let (f_res, jacobian_res, hessian_res) = engine.hessian(&inputs);
    
    println!("f'(2) = {}", jacobian_res[(0,0)]);    // 12.0
    println!("f''(2) = {}", hessian_res[0][(0,0)]); // 12.0
}

First-Order Derivatives from Hessian Engines

It is important to note that a FunctionEngine initialized for Hessian computation can still be used for standard first-order derivatives.

Calling engine.derivative(&inputs) on a Hessian-enabled block will return the Jacobian matrix as usual. This is possible because the hyper-dual types used for second-order differentiation internally track the first-order gradients as their primal "tangent" values.

This allows you to maintain a single FunctionEngine instance for both standard gradient-based optimization and second-order methods.