Hessian / Second-Order AD

ad_trait supports computing second-order derivatives (Hessians) by nesting automatic differentiation types. This is achieved through the HyperAD family of types.

[!NOTE] The bracketed value <N> (e.g., adfn<N>, HessianAD<N>) is a const generic that specifies the number of tangent lanes. For Hessian computation, this typically should match the number of input variables you are differentiating with respect to.

Mode	Type	Best For	Scaling
Forward-over-Forward	`HyperAD_ADFN<N>`	Few inputs ($N < 20$), low memory.	$O(N^2)$
Forward-over-Reverse	`HyperAD_ADR<N>`	Many inputs, few outputs (e.g., loss functions).	$O(N)$

Choosing the Right Mode

Selecting the optimal mode depends primarily on the number of input variables ($N$) and the memory constraints of your application.

Forward-over-Forward (FoF)

When to use: Use this when you have a small number of inputs. It is the most robust mode and has the lowest memory overhead because it does not require building a computation graph.
Efficiency: The computational cost scales quadratically with the number of inputs ($O(N^2)$). For a function with 10 inputs, it is very fast; for 1000 inputs, it becomes prohibitively slow.
Implementation: Uses HessianAD<N>.

Forward-over-Reverse (FoR)

When to use: Use this for functions with many inputs and a single (or few) outputs, such as a neural network loss function or a complex physics simulation.
Efficiency: This mode is significantly more efficient for large $N$. Because the inner layer is Reverse-mode AD, a single backpropagation through a tangent value can recover an entire row of the Hessian. This allows the total cost to scale linearly with the number of inputs ($O(N)$) for scalar-valued functions.
Memory: Higher memory usage than FoF because it must maintain the reverse-mode computation graph.
Implementation: Uses HessianAD_FOR<N>.

Forward-over-Forward Hessian

This mode uses HyperAD_ADFN, which is essentially an adfn type where the primary value and tangents are themselves adfn types.

Example

use ad_trait::AD;
use ad_trait::function_engine::FunctionEngine;
use ad_trait::differentiable_function::{DifferentiableFunctionTrait, HessianAD, ToOtherADType};
use ad_trait::hyper_ad::hyper::HyperAD_ADFN;

#[derive(Clone)]
struct MyFunc;
impl<T: AD> DifferentiableFunctionTrait<T> for MyFunc {
    const NAME: &'static str = "MyFunc";
    fn call(&self, inputs: &[T], _freeze: bool) -> Vec<T> {
        let x = inputs[0];
        vec![ x * x * x ] // f(x) = x^3
    }
    fn num_inputs(&self) -> usize { 1 }
    fn num_outputs(&self) -> usize { 1 }
}

fn main() {
    let inputs = [2.0];
    let func = MyFunc;
    let engine = FunctionEngine::new(
        func.clone(), 
        func.to_other_ad_type::<HyperAD_ADFN<1>>(), 
        HessianAD::<1>::new()
    );

    let (f_res, jacobian_res, hessian_res) = engine.hessian(&inputs);
    
    println!("f(2) = {}", f_res[0]);             // 8.0
    println!("f'(2) = {}", jacobian_res[(0,0)]);  // 12.0
    println!("f''(2) = {}", hessian_res[0][(0,0)]); // 12.0
}

Forward-over-Reverse Hessian

This mode uses HyperAD_ADR, which uses adr as the inner type. This is useful when you want to combine the benefits of forward and reverse mode.

Example

use ad_trait::AD;
use ad_trait::function_engine::FunctionEngine;
use ad_trait::differentiable_function::{DifferentiableFunctionTrait, HessianAD_FOR, ToOtherADType};
use ad_trait::hyper_ad::hyper_adr::HyperAD_ADR;

#[derive(Clone)]
struct MyFunc;
impl<T: AD> DifferentiableFunctionTrait<T> for MyFunc {
    const NAME: &'static str = "MyFunc";
    fn call(&self, inputs: &[T], _freeze: bool) -> Vec<T> {
        let x = inputs[0];
        vec![ x * x * x ]
    }
    fn num_inputs(&self) -> usize { 1 }
    fn num_outputs(&self) -> usize { 1 }
}

fn main() {
    let inputs = [2.0];
    let func = MyFunc;
    let engine = FunctionEngine::new(
        func.clone(), 
        func.to_other_ad_type::<HyperAD_ADR<1>>(), 
        HessianAD_FOR::<1>::new()
    );

    let (f_res, jacobian_res, hessian_res) = engine.hessian(&inputs);
    
    println!("f'(2) = {}", jacobian_res[(0,0)]);    // 12.0
    println!("f''(2) = {}", hessian_res[0][(0,0)]); // 12.0
}

First-Order Derivatives from Hessian Engines

It is important to note that a FunctionEngine initialized for Hessian computation can still be used for standard first-order derivatives.

Calling engine.derivative(&inputs) on a Hessian-enabled block will return the Jacobian matrix as usual. This is possible because the hyper-dual types used for second-order differentiation internally track the first-order gradients as their primal "tangent" values.

This allows you to maintain a single FunctionEngine instance for both standard gradient-based optimization and second-order methods.

ad_trait: Automatic Differentiation for Rust