Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains
By: Zheyuan Lai, Michael C. H. Choi
Potential Business Impact:
Finds best ways to guess future events.
We study an information-theoretic minimax problem for finite multivariate Markov chains on $d$-dimensional product state spaces. Given a family $\mathcal B=\{P_1,\ldots,P_n\}$ of $\pi$-stationary transition matrices and a class $\mathcal F = \mathcal{F}(\mathbf{S})$ of factorizable models induced by a partition $\mathbf S$ of the coordinate set $[d]$, we seek to minimize the worst-case information loss by analyzing $$\min_{Q\in\mathcal F}\max_{P\in\mathcal B} D_{\mathrm{KL}}^{\pi}(P\|Q),$$ where $D_{\mathrm{KL}}^{\pi}(P\|Q)$ is the $\pi$-weighted KL divergence from $Q$ to $P$. We recast the above minimax problem into concave maximization over the $n$-probability-simplex via strong duality and Pythagorean identities that we derive. This leads us to formulate an information-theoretic game and show that a mixed strategy Nash equilibrium always exists; and propose a projected subgradient algorithm to approximately solve the minimax problem with provable guarantee. By transforming the minimax problem into an orthant submodular function in $\mathbf{S}$, this motivates us to consider a max-min-max submodular optimization problem and investigate a two-layer subgradient-greedy procedure to approximately solve this generalization. Numerical experiments for Markov chains on the Curie-Weiss and Bernoulli-Laplace models illustrate the practicality of these proposed algorithms and reveals sparse optimal structures in these examples.
Similar Papers
Simple and Sharp Generalization Bounds via Lifting
Statistics Theory
Makes computer learning more accurate and faster.
Pareto-frontier Entropy Search with Variational Lower Bound Maximization
Machine Learning (CS)
Finds best results when many goals matter.
Information Gradient for Nonlinear Gaussian Channel with Applications to Task-Oriented Communication
Information Theory
Improves how machines learn from noisy signals.