Conditioning Diffusions Using Malliavin Calculus
By: Jakiw Pidstrigach , Elizabeth Baker , Carles Domingo-Enrich and more
Potential Business Impact:
Makes AI learn from rewards that are hard to measure.
In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and centred around a generalisation of the Tweedie score formula to nonlinear stochastic differential equations, that enables the development of methods robust to such singularities. This allows our approach to handle a broad range of applications, like diffusion bridges, or adding conditional controls to an already trained diffusion model. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques. As a byproduct, we also introduce a novel score matching objective. Our loss functions are formulated such that they could readily be extended to manifold-valued and infinite dimensional diffusions.
Similar Papers
Malliavin Calculus for Score-based Diffusion Models
Machine Learning (CS)
Makes AI create realistic images and sounds.
Evolvable Conditional Diffusion
Machine Learning (CS)
Helps computers discover new science without math.
Dynamics-aware Diffusion Models for Planning and Control
Robotics
Makes robots move safely in tricky places.