Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models
By: Seunghwan Jang, SooJean Han
Potential Business Impact:
Makes AI write better by fixing mistakes faster.
CTMC/DTMC-based discrete generative models, including uniform-noise discrete diffusion (e.g., D3PM/CTDD) and discrete flow matching, enable non-autoregressive sequence generation by repeatedly replacing tokens through a time-inhomogeneous Markov process. Inference is typically implemented with step-based simulation: each token decides to jump via independent Bernoulli (or categorical) draws at every discretization step. Under uniform-noise initialization, where self-correction requires multiple edits per position, these independent decisions induce substantial variance in both the number and timing of edits, leading to characteristic failure modes such as under-editing (residual noise) or over-editing (cascading unnecessary substitutions), decreasing reproducibility. We propose Stratified Hazard Sampling (SHS), a drop-in and hyperparameter-free inference principle for any sampler that admits a stay-vs.-replace decomposition. SHS models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC) and places events by stratifying this cumulative quantity: with a single random phase per position, a token jumps whenever its accumulated hazard crosses unit-spaced thresholds. This preserves the expected number of jumps while achieving the minimum possible variance among unbiased integer estimators (bounded by 1/4), without altering per-jump destination sampling and thus retaining multimodality. We also introduce a phase-allocation variant for blacklist-style lexical constraints that prioritizes early edits at high-risk positions to mitigate late-masking artifacts.
Similar Papers
Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs
CV and Pattern Recognition
Makes AI create pictures much faster.
Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion
Machine Learning (Stat)
Makes AI create pictures faster and better.
Optimal Inference Schedules for Masked Diffusion Models
Machine Learning (CS)
Makes AI write faster by guessing words out of order.