arXiv Analytics

Sign in

arXiv:2210.09986 [cond-mat.stat-mech]AbstractReferencesReviewsResources

Phase transition in the computational complexity of the shortest common superstring and genome assembly

L. A. Fernandez, V. Martin-Mayor, D. Yllanes

Published 2022-10-18Version 1

Genome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to produce and assemble datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the `easy' phase (solvable by polynomial-time algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard regime.

Related articles: Most relevant | Search more
arXiv:cond-mat/9905359 (Published 1999-05-25)
Phase transition in inelastic disks
arXiv:cond-mat/0001136 (Published 2000-01-11)
Phase transitions in generalized chiral or Stiefel's models
arXiv:cond-mat/0201174 (Published 2002-01-11)
Phase transition in a 2-dimensional Heisenberg model