Search ResultsShowing 1-20 of 59
-
arXiv:2406.18855 (Published 2024-06-27)
Limiting partition function for the Mallows model: a conjecture and partial evidence
Comments: 11 pagesCategories: math.PRLet $S_n$ denote the set of permutations of $n$ labels. We consider a class of Gibbs probability models on $S_n$ that is a subfamily of the so-called Mallows model of random permutations. The Gibbs energy is given by a class of right invariant divergences on $S_n$ that includes common choices such as the Spearman foot rule and the Spearman rank correlation. Mukherjee in 2016 computed the limit of the (scaled) log partition function (i.e. normalizing factor) of such models as $n\rightarrow \infty$. Our objective is to compute the exact limit, as $n\rightarrow \infty$, without the log. We conjecture that this limit is given by the Fredholm determinant of an integral operator related to the so-called Schr\"odinger bridge probability distributions from optimal transport theory. We provide partial evidence for this conjecture, although the argument lacks a final error bound that is needed for it to become a complete proof.
-
arXiv:2406.10823 (Published 2024-06-16)
Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows
Comments: 36 pages, 1 figureWe introduce a novel discretization scheme for Wasserstein gradient flows that involves successively computing Schr\"{o}dinger bridges with the same marginals. This is different from both the forward/geodesic approximation and the backward/Jordan-Kinderlehrer-Otto (JKO) approximations. The proposed scheme has two advantages: one, it avoids the use of the score function, and, two, it is amenable to particle-based approximations using the Sinkhorn algorithm. Our proof hinges upon showing that relative entropy between the Schr\"{o}dinger bridge with the same marginals at temperature $\epsilon$ and the joint distribution of a stationary Langevin diffusion at times zero and $\epsilon$ is of the order $o(\epsilon^2)$ with an explicit dependence given by Fisher information. Owing to this inequality, we can show, using a triangular approximation argument, that the interpolated iterated application of the Schr\"{o}dinger bridge approximation converge to the Wasserstein gradient flow, for a class of gradient flows, including the heat flow. The results also provide a probabilistic and rigorous framework for the convergence of the self-attention mechanisms in transformer networks to the solutions of heat flows, first observed in the inspiring work SABP22 in machine learning research.
-
arXiv:2309.08598 (Published 2023-09-15)
Projected Langevin dynamics and a gradient flow for entropic optimal transport
The classical (overdamped) Langevin dynamics provide a natural algorithm for sampling from its invariant measure, which uniquely minimizes an energy functional over the space of probability measures, and which concentrates around the minimizer(s) of the associated potential when the noise parameter is small. We introduce analogous diffusion dynamics that sample from an entropy-regularized optimal transport, which uniquely minimizes the same energy functional but constrained to the set $\Pi(\mu,\nu)$ of couplings of two given marginal probability measures $\mu$ and $\nu$ on $\mathbb{R}^d$, and which concentrates around the optimal transport coupling(s) for small regularization parameter. More specifically, our process satisfies two key properties: First, the law of the solution at each time stays in $\Pi(\mu,\nu)$ if it is initialized there. Second, the long-time limit is the unique solution of an entropic optimal transport problem. In addition, we show by means of a new log-Sobolev-type inequality that the convergence holds exponentially fast, for sufficiently large regularization parameter and for a class of marginals which strictly includes all strongly log-concave measures. By studying the induced Wasserstein geometry of the submanifold $\Pi(\mu,\nu)$, we argue that the SDE can be viewed as a Wasserstein gradient flow on this space of couplings, at least when $d=1$, and we identify a conjectural gradient flow for $d \ge 2$. The main technical difficulties stems from the appearance of conditional expectation terms which serve to constrain the dynamics to $\Pi(\mu,\nu)$.
-
arXiv:2308.09214 (Published 2023-08-18)
Path convergence of Markov chains on large graphs
Comments: 43 pages+references, 1 figure, 1 tableWe consider two classes of natural stochastic processes on finite unlabeled graphs. These are Euclidean stochastic optimization algorithms on the adjacency matrix of weighted graphs and a modified version of the Metropolis MCMC algorithm on stochastic block models over unweighted graphs. In both cases we show that, as the size of the graph goes to infinity, the random trajectories of the stochastic processes converge to deterministic limits. These deterministic limits are curves on the space of measure-valued graphons. Measure-valued graphons, introduced by Lov\'{a}sz and Szegedy, are a refinement of the concept of graphons that can distinguish between two infinite exchangeable arrays that give rise to the same graphon limit. We introduce new metrics on this space which provide us with a natural notion of convergence for our limit theorems. This notion is equivalent to the convergence of infinite-exchangeable arrays. Under a suitable time-scaling, the Metropolis chain admits a diffusion limit as the number of vertices go to infinity. We then demonstrate that, in an appropriately formulated zero-noise limit, the stochastic process of adjacency matrices of this diffusion converge to a deterministic gradient flow curve on the space of graphons introduced in arXiv:2111.09459 [math.PR]. Under suitable assumptions, this allows us to estimate an exponential convergence rate for the Metropolis chain in a certain limiting regime. To the best of our knowledge, both the actual rate and the connection between a natural Metropolis chain commonly used in exponential random graph models and gradient flows on graphons are new in the literature.
-
arXiv:2307.16421 (Published 2023-07-31)
Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm
Comments: 49 pages, 2 figuresWe prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on joint densities, converges to an absolutely continuous curve on the $2$-Wasserstein space, as the regularization parameter $\varepsilon$ goes to zero and the number of iterations is scaled as $1/\varepsilon$ (and other technical assumptions). This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow, a concept we introduce here inspired by the well-known Euclidean mirror gradient flows. In the case of Sinkhorn, the gradient is that of the relative entropy functional with respect to one of the marginals and the mirror is half of the squared Wasserstein distance functional from the other marginal. Interestingly, the norm of the velocity field of this flow can be interpreted as the metric derivative with respect to the linearized optimal transport (LOT) distance. An equivalent description of this flow is provided by the parabolic Monge-Amp\`{e}re PDE whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We derive conditions for exponential convergence for this limiting flow. We also construct a Mckean-Vlasov diffusion whose marginal distributions follow the Sinkhorn flow.
-
arXiv:2305.17269 (Published 2023-05-26)
The Aldous diffusion: a stationary evolution of the Brownian CRT
Comments: 193+vi pages, 26 figuresCategories: math.PRMotivated by a down-up Markov chain on cladograms, David Aldous conjectured in 1999 that there exists a "diffusion on continuum trees" whose mass partitions at any finite number of branch points evolve as Wright-Fisher diffusions with some negative mutation rates, until some branch point disappears. Building on previous work on interval-partition-valued processes, we construct this conjectured process via a consistent system of stationary evolutions of binary trees with k labeled leaves and edges decorated with interval partitions. The interval partitions are scaled Poisson-Dirichlet interval partitions whose interval lengths record subtree masses. They also possess a diversity property that captures certain distances in the continuum tree. Continuously evolving diversities give access to continuously evolving continuum tree distances. The pathwise construction allows us to study this "Aldous diffusion" in the Gromov-Hausdorff-Prokhorov space of rooted, weighted R-trees. We establish the simple Markov property and path-continuity. The Aldous diffusion is stationary with the distribution of the Brownian continuum random tree. While the Brownian CRT is a.s. binary, we show that there is a dense null set of exceptional times when the Aldous diffusion has a ternary branch point, including stopping times at which the strong Markov property fails. Our construction relates to the two-parameter Chinese restaurant process, branching processes, and stable L\'evy processes, among other connections. Wright-Fisher diffusions and the aforementioned processes of Poisson-Dirichlet interval partitions arise as interesting projections of the Aldous diffusion. Finally, one can embed Aldous's stationary down-up Markov chain on cladograms in the Aldous diffusion and hence address a related conjecture by David Aldous by establishing a scaling limit theorem.
-
arXiv:2210.00422 (Published 2022-10-02)
Stochastic optimization on matrices and a graphon McKean-Vlasov limit
Comments: 35 pagesCategories: math.PRWe consider stochastic gradient descents on the space of large symmetric matrices of suitable functions that are invariant under permuting the rows and columns using the same permutation. We establish deterministic limits of these random curves as the dimensions of the matrices go to infinity while the entries remain bounded. Under a ``small noise'' assumption the limit is shown to be the gradient flow of functions on graphons whose existence was established in arXiv:2111.09459. We also consider limits of stochastic gradient descents with added properly scaled reflected Brownian noise. The limiting curve of graphons is characterized by a family of stochastic differential equations with reflections and can be thought of as an extension of the classical McKean-Vlasov limit for interacting diffusions. The proofs introduce a family of infinite-dimensional exchangeable arrays of reflected diffusions and a novel notion of propagation of chaos for large matrices of interacting diffusions.
-
arXiv:2111.09459 (Published 2021-11-18)
Gradient flows on graphons: existence, convergence, continuity equations
Comments: 40 pages, 2 figuresWasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.
-
arXiv:2101.09307 (Published 2021-01-22)
Ranked masses in two-parameter Fleming-Viot diffusions
Comments: 20 pagesCategories: math.PRIn previous work, we constructed Fleming--Viot-type measure-valued diffusions (and diffusions on a space of interval partitions of the unit interval $[0,1]$) that are stationary with the Poisson--Dirichlet laws with parameters $\alpha\in(0,1)$ and $\theta\geq 0$. In this paper, we complete the proof that these processes resolve a conjecture by Feng and Sun (2010) by showing that the processes of ranked atom sizes (or of ranked interval lengths) of these diffusions are members of a two-parameter family of diffusions introduced by Petrov (2009), extending a model by Ethier and Kurtz (1981) in the case $\alpha=0$. The latter diffusions are continuum limits of up-down Chinese restaurant processes.
-
arXiv:2011.08963 (Published 2020-11-17)
Asymptotics of Entropy-Regularized Optimal Transport via Chaos Decomposition
Consider the problem of estimating the optimal coupling (i.e., matching) between $N$ i.i.d. data points sampled from two densities $\rho_0$ and $\rho_1$ in $\mathbb{R}^d$. The cost of transport is an arbitrary continuous function that satisfies suitable growth and integrability assumptions. For both computational efficiency and smoothness, often a regularization term using entropy is added to this discrete problem. We introduce a modification of the commonly used discrete entropic regularization (Cuturi '13) such that the optimal coupling for the regularized problem can be thought of as the static Schr\"odinger bridge with $N$ particles. This paper is on the asymptotic properties of this discrete Schr\"odinger bridge as $N$ tends to infinity. We show that it converges to the continuum Schr\"odinger bridge and derive the first two error terms of orders $N^{-1/2}$ and $N^{-1}$, respectively. This gives us functional CLT, including for the cost of transport, and second order Gaussian chaos limits when the limiting Gaussian variance is zero, extending similar recent results derived for finite state spaces and the quadratic cost. The proofs are based on a novel chaos decomposition of the discrete Schr\"odinger bridge by polynomial functions of the pair of empirical distributions as a first and second order Taylor approximations in the space of measures. This is achieved by extending the Hoeffding decomposition from the classical theory of U-statistics. The kernels corresponding to the first and second order chaoses are given by Markov operators which have natural interpretations in the Sinkhorn algorithm.
-
arXiv:1912.00126 (Published 2019-11-30)
Contradictory predictions
We prove the sharp bound for the probability that two experts who have access to different information, represented by different $\sigma$-fields, will give radically different estimates of the probability of an event. This is relevant when one combines predictions from various experts in a common probability space to obtain an aggregated forecast. Our proof is constructive in the sense that, not only the sharp bounds are proved, but also the optimizer is constructed via an explicit algorithm.
-
arXiv:1910.07626 (Published 2019-10-16)
Diffusions on a space of interval partitions: Poisson-Dirichlet stationary distributions
Comments: 50 pages, 8 figures. Following arXiv:1909.02584 [math.PR], this is the second in a sequence of three papers that will collectively supersede arXiv:1609.06706 [math.PR]Categories: math.PRWe introduce diffusions on a space of interval partitions of the unit interval that are stationary with the Poisson-Dirichlet laws with parameters $(\alpha,0)$ and $(\alpha,\alpha)$. The construction has two steps. The first is a general construction of interval partition processes obtained previously, by decorating the jumps of a L\'evy process with independent excursions. Here, we focus on the second step, which requires explicit transition kernels and what we call pseudo-stationarity. This allows us to study processes obtained from the original construction via scaling and time-change. In a sequel paper, we establish connections to diffusions on decreasing sequences introduced by Ethier and Kurtz (1981) and Petrov (2009). The latter diffusions are continuum limits of up-down Markov chains on Chinese restaurant processes. Our construction is also a step towards resolving longstanding conjectures by Feng and Sun on measure-valued Poisson-Dirichlet diffusions, and by Aldous on a continuum-tree-valued diffusion.
-
arXiv:1909.02584 (Published 2019-09-05)
Diffusions on a space of interval partitions: construction from marked Lévy processes
Comments: 60 pages, 4 figures. Two supplements have been included as "ancillary files" in this arXiv submission. This is the first in a sequence of three papers that will collectively supersede arXiv:1609.06706 [math.PR]Categories: math.PRConsider a spectrally positive Stable($1+\alpha$) process whose jumps we interpret as lifetimes of individuals. We mark the jumps by continuous excursions assigning "sizes" varying during the lifetime. As for Crump-Mode-Jagers processes (with "characteristics"), we consider for each level the collection of individuals alive. We arrange their "sizes" at the crossing height from left to right to form an interval partition. We study the continuity and Markov properties of the interval-partition-valued process indexed by level. From the perspective of the Stable($1+\alpha$) process, this yields new theorems of Ray-Knight-type. From the perspective of branching processes, this yields new, self-similar models with dense sets of birth and death times of (mostly short-lived) individuals. This paper feeds into projects resolving conjectures by Feng and Sun (2010) on the existence of certain measure-valued diffusions with Poisson--Dirichlet stationary laws, and by Aldous (1999) on the existence of a continuum-tree-valued diffusion.
-
arXiv:1907.02132 (Published 2019-07-03)
Metrics on sets of interval partitions with diversity
Comments: 12 pages; this preprint generalises and supersedes the parts of arXiv:1609.06706 concerning topologies on interval partitionsCategories: math.PRWe first consider interval partitions whose complements are Lebesgue-null and introduce a complete metric that induces the same topology as the Hausdorff distance (between complements). This is done using correspondences between intervals. Further restricting to interval partitions with alpha-diversity, we then adjust the metric to incorporate diversities. We show that this second metric space is Lusin. An important feature of this topology is that path-continuity in this topology implies the continuous evolution of diversities. This is important in related work on tree-valued stochastic processes where diversities are branch lengths.
-
arXiv:1905.12206 (Published 2019-05-29)
On the difference between entropic cost and the optimal transport cost
Comments: 25 pagesCategories: math.PRConsider the Monge-Kantorovich problem of transporting densities $\rho_0$ to $\rho_1$ on $\mathbb{R}^d$ with a strictly convex cost function. A popular relaxation of the problem is the one-parameter family called the entropic cost problem. The entropic cost $K_h$, $h>0$, is significantly faster to compute and $h K_h$ is known to converge to the optimal transport cost as $h$ goes to zero. We are interested the rate of convergence. We show that the difference between $K_h$ and $1/h$ times the optimal cost of transport has a pointwise limit when transporting a compactly supported density to another that satisfies a few other technical restrictions. This limit is the relative entropy of $\rho_1$ with respect to a Riemannian volume measure on $\mathbb{R}^d$ that measures the local sensitivity of the transport map. For the quadratic Wasserstein transport, this relative entropy is exactly one half of the difference of entropies of $\rho_1$ and $\rho_0$. In that case we complement the results of Adams et al., Duong et al, and Erbar et al. who all use gamma convergence. More surprisingly, we demonstrate that this difference of two entropies (plus the cost) is also the limit for the Dirichlet transport introduced recently by Pal and Wong. The latter can be thought of as a multiplicative analog of the Wasserstein transport and corresponds to a non-local operator. It hints at an underlying gradient flow of entropy, in the sense of Jordan-Kinderlehrer-Otto, even when the cost function is not a metric. The proofs are based on Gaussian approximations to Schr\"odinger bridges as $h$ approaches zero.
-
arXiv:1904.05981 (Published 2019-04-11)
Community Detection in the Sparse Hypergraph Stochastic Block Model
Comments: 46 pages, 4 figuresWe consider the community detection problem in sparse random hypergraphs. Angelini et al. (2015) conjectured the existence of a sharp threshold on model parameters for community detection in sparse hypergraphs generated by a hypergraph stochastic block model (HSBM). We solve the positive part of the conjecture for the case of two blocks: above the threshold, there is a spectral algorithm which asymptotically almost surely constructs a partition of the hypergraph correlated with the true partition. Our method is a generalization to random hypergraphs of the method developed by Massouli\'{e} (2014) for sparse random graphs.
-
arXiv:1809.07756 (Published 2018-09-20)
Aldous diffusion I: a projective system of continuum $k$-tree evolutions
The Aldous diffusion is a conjectured Markov process on the space of real trees that is the continuum analogue of discrete Markov chains on binary trees. We construct this conjectured process via a consistent system of stationary evolutions of binary trees with $k$ labeled leaves and edges decorated with diffusions on a space of interval partitions constructed in previous work by the same authors. This pathwise construction allows us to study and compute path properties of the Aldous diffusion including evolutions of projected masses and distances between branch points. A key part of proving the consistency of the projective system is Rogers and Pitman's notion of intertwining.
-
arXiv:1808.02164 (Published 2018-08-07)
A Note on Transportation Cost Inequalities for Diffusions with Reflections
Comments: 8 pages. Keywords: Reflected Brownian motion, Wasserstein distance, relative entropy, transportation cost-information inequality, concentration of measure, competing Brownian particlesCategories: math.PRWe prove that reflected Brownian motion with normal reflections in a convex domain satisfies a dimension free Talagrand type transportation cost-information inequality. The result is generalized to other reflected diffusions with suitable drifts and diffusions. We apply this to get such an inequality for interacting Brownian particles with rank-based drift and diffusion coefficients such as the infinite Atlas model. This is an improvement over earlier dimension-dependent results.
-
arXiv:1807.05649 (Published 2018-07-16)
Multiplicative Schrödinger problem and the Dirichlet transport
Comments: 31 pagesCategories: math.PRWe consider an optimal transport problem on the unit simplex whose solutions are given by gradients of exponentially concave functions and prove two main results. One, we show that the optimal transport is the large deviation limit of a particle system of Dirichlet processes transporting one probability measure on the unit simplex to another by coordinatewise multiplication and normalizing. The structure of our Lagrangian and the appearance of the Dirichlet process relate our problem closely to the entropic measure on the Wasserstein space as defined by von-Renesse and Sturm in the context of Wasserstein diffusion. The limiting procedure is a triangular limit where we allow simultaneously the number of particles to grow to infinity while the `noise' goes to zero. The method, which generalizes easily to other cost functions, including the Wasserstein cost, provides a novel combination of the Schr\"odinger problem approach due to C. L\'eonard and the related Brownian particle systems by Adams et al. which does not require gamma convergence. Two, we analyze the behavior of entropy along the lines of transport. The base measure on the simplex is taken to be the Dirichlet measure with all zero parameters which relates to the finite-dimensional distributions of the entropic measure. The interpolating curves are not the usual McCann lines. Nevertheless we show that entropy plus the transport cost remains convex, which is reminiscent of the semiconvexity of entropy along lines of McCann interpolations in negative curvature spaces. We also obtain, under suitable conditions, dimension-free bounds of the optimal transport cost in terms of entropy.
-
arXiv:1802.00862 (Published 2018-02-02)
Projections of the Aldous chain on binary trees: Intertwining and consistency
Comments: 30 pages, 8 figuresCategories: math.PRConsider the Aldous Markov chain on the space of rooted binary trees with $n$ labeled leaves in which at each transition a uniform random leaf is deleted and reattached to a uniform random edge. Now, fix $1\le k < n$ and project the leaf mass onto the subtree spanned by the first $k$ leaves. This yields a binary tree with edge weights that we call a "decorated $k$-tree with total mass $n$." We introduce label swapping dynamics for the Aldous chain so that, when it runs in stationarity, the decorated $k$-trees evolve as Markov chains themselves, and are projectively consistent over $k\le n$. The construction of projectively consistent chains is a crucial step in the construction of the Aldous diffusion on continuum trees by the present authors, which is the $n\rightarrow \infty$ continuum analogue of the Aldous chain and will be taken up elsewhere. Some of our results have been generalized to Ford's alpha model trees.