Pyadjoint Utils#

CRIKit has extended Pyadjoint to add some new features. Pyadjoint has its own set of documentation, so only these new features are documented here.

Core additions#

Jacobian assembly#

Pyadjoint currently supports computing the action of the Jacobian, adjoint Jacobian, and Hessian. However, some problems can be solved much more efficiently with access to the full Jacobian matrix. The Block.evaluate_tlm_matrix() method is set up to propagate the Jacobian forward by multiplying Jacobian of its inputs with the Block’s own Jacobian. The driver compute_jacobian_matrix() runs this computation on a Tape for a given set of inputs and outputs.

ReducedFunction#

The ReducedFunction class generalizes Pyadjoint’s ReducedFunctional class to represent functions with multiple outputs. Instead of having a derivative() function, the ReducedFunction class has jac_action(), adj_jac_action(), and jac_matrix().

Solving#

A set of computations on a tape can be represented with a reduced function \(J(x)\). This idea can be extended to equations. The ReducedEquation represents an equation \(J(x) = 0\) possibly subject to some Dirichlet boundary conditions on \(x\).

A ReducedEquation is instantiated with a ReducedFunction, a set of Dirichlet boundary condition functions, and a corresponding set of homogeneous Dirichlet boundary conditions. A Dirichlet boundary condition function takes the input \(x\) and sets some of the components to fixed values. The corresponding homogeneous condition sets those components to zero.

The SNESSolver is currently the only solver implemented for ReducedEquations. It uses PETSc’s System of Nonlinear Equations Solver (SNES).

Module additions#

FEniCS#

The fenics_adjoint module adds annotated functions to convert a FEniCS function to a NumPy array: function_get_local() and function_set_local(). These functions lets arbitrary operations be easily applied to FEniCS functions by doing the operation on the underlying array.

JAX#

The pyadjoint_utils.jax_adjoint module provides the ndarray class, which is a Pyadjoint-overloaded object that wraps a jax.numpy.ndarray, as well as the decorator overload_jax(), which provides access to jax ‘s JIT compilation and differentiation facilities. Usage looks like:

from jax import numpy as np
import numpy as onp
from pyadjoint_utils import array, overload_jax, ReducedFunction, taylor_test, Control
from jax.tree_util import Partial as partial # or, from functools import partial

#differentiate with respect to both non-self arguments.
#Also, we want this function to recompute its internal linearization points
#when differentiated, instead of storing them
@partial(overload_jax,argnums=(0,1),checkpoint=True)
def my_func(x, y):
    return np.exp(np.trace(x @ y))


inputs = (array(onp.random.randn(3,3)),array(onp.random.randn(3,3)))
controls = [Control(x) for x in inputs]
output = my_func(*inputs)
rf = ReducedFunction(output,controls)
h = [1.0e-1 * array(onp.random.randn(*x.shape) for x in inputs)]
assert taylor_test(rf,inputs,h) >= 1.9

Sharp Edges#

JAX is still a relatively new library and has some sharp edges to be aware of. A common error is of the form

ValueError: linearized function called on tangent values inconsistent with the original primal values: got ConcreteArray([[-0.65073081 -0.0755682 ]
           [ 0.07366651 -1.6313327 ]]) for primal aval ConcreteArray([[ 1.2067288  -0.5176796 ]
                                                                      [ 0.4400338   0.39490446]])

This error is almost always either a result of the tangent values being of a different shape to the primal values, or, if the shape is the same (as is the case in the example error message shown above), the dtype is different. This can happen if, for example, you directly pass parameters obtained from an init_params function itself obtained from jax.experimental.stax to a crikit.cr.cr.CR, since those parameters will be of dtype jax.numpy.float32, whereas the default CRIKit arithmetic datatype is jax.numpy.float64.

TensorFlow#

The tensorflow_adjoint module allows TensorFlow graphs to be represented as Blocks on the tape. The function run_tensorflow_graph() runs a given tensorflow graph and records the operation on the Pyadjoint tape, so that recomputation and adjoint computations can be computed with Pyadjoint

NumPy#

The numpy_adjoint module allows NumPy arrays to be recorded on the tape, which is necessary for using the new features in the two modules above. Additionally, Autograd functions can be overloaded using overload_autograd() to allow Pyadjoint to access Autograd derivatives.

Developer Notes#

Jacobian Matrix Structure#

For a multi-output function \(\mathbf{J}(\mathbf{x})\), the Jacobian \(\frac{d\mathbf{J}}{d\mathbf{x}}\) is represented as a list of lists, where each lists has the Jacobians for one output of \(\mathbf{J}\). For example, for a function with two outputs and three inputs, the lists will look like the following:

\[\frac{d\mathbf{J}}{d\mathbf{x}} = \begin{bmatrix} [ & \frac{dJ_1}{dx_1} & \frac{dJ_1}{dx_2} & \frac{dJ_1}{dx_3} & ], \\ [ & \frac{dJ_2}{dx_1} & \frac{dJ_2}{dx_2} & \frac{dJ_2}{dx_3} & ], \end{bmatrix} \]

Each \(\frac{dJ_i}{dx_j}\) is a NumPy array with shape (*J_i.shape, *x_j.shape).

Jacobian Matrix Accumulation#

The computation of the Jacobian matrix on a tape has each block multiply an incoming Jacobian with its own Jacobian. This is most easily understood with an example. Let \(\mathbf{x}\) be a set of inputs and \(\mathbf{g}(\mathbf{y})\) be a block on the tape. The \(\mathbf{g}\) block knows its own Jacobian \(\frac{d\mathbf{g}}{d\mathbf{y}}\), and it must compute the Jacobian \(\frac{d\mathbf{g}}{d\mathbf{x}}\) given the Jacobian \(\frac{d\mathbf{y}}{d\mathbf{x}}\).

\[\frac{d\mathbf{g}}{d\mathbf{x}} = \frac{d\mathbf{g}}{d\mathbf{y}} \frac{d\mathbf{y}}{d\mathbf{x}} \]

Each individual output Jacobian can be computed with the following formula:

\[\frac{dg_i}{dx_j} = \sum_k \frac{dg_i}{dy_k} \frac{dy_k}{dx_j} \]

Here, \(\frac{dg_i}{dy_k}\) is a NumPy array with shape {code}(*g_i.shape, *y_k.shape). and \(\frac{dy_k}{dx_j}\) is a NumPy array with shape (*y_k.shape, *x_j.shape). These are tensors that must be contracted over the {}y_k dimensions. In NumPy, this contraction can be done with the following code:

np.tensordot(dgi_dyk, dyk_dxj, axes=len(yk.shape))

The Jacobian \(\frac{dg_i}{d\mathbf{x}}\) for each output variable \(g_i\) is a list recorded in the corresponding BlockVariable instance using the BlockVariable.add_tlm_matrix() function. The Block.evaluate_tlm_matrix_component() should return the Jacobian (as a list) of one of the outputs of the block.