{ "id": "1210.0508", "version": "v5", "published": "2012-10-01T19:13:59.000Z", "updated": "2017-01-20T08:00:44.000Z", "title": "Inference algorithms for pattern-based CRFs on sequence data", "authors": [ "Rustem Takhanov", "Vladimir Kolmogorov" ], "comment": "Algorithmica accepted version", "journal": "Algorithmica, September 2016, Volume 76, Issue 1, pp 17-46", "doi": "10.1007/s00453-015-0017-7", "categories": [ "cs.LG", "cs.DS" ], "abstract": "We consider Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) $x_1...x_n$ is the sum of terms over intervals $[i,j]$ where each term is non-zero only if the substring $x_i...x_j$ equals a prespecified pattern $\\alpha$. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively $O(n L)$, $O(n L \\ell_{max})$ and $O(n L \\min\\{|D|,\\log (\\ell_{max}+1)\\})$ where $L$ is the combined length of input patterns, $\\ell_{max}$ is the maximum length of a pattern, and $D$ is the input alphabet. This improves on the previous algorithms of (Ye et al., 2009) whose complexities are respectively $O(n L |D|)$, $O(n |\\Gamma| L^2 \\ell_{max}^2)$ and $O(n L |D|)$, where $|\\Gamma|$ is the number of input patterns. In addition, we give an efficient algorithm for sampling. Finally, we consider the case of non-positive weights. (Komodakis & Paragios, 2009) gave an $O(n L)$ algorithm for computing the MAP. We present a modification that has the same worst-case complexity but can beat it in the best case.", "revisions": [ { "version": "v4", "updated": "2012-12-29T22:13:01.000Z", "comment": null, "journal": null, "doi": null }, { "version": "v5", "updated": "2017-01-20T08:00:44.000Z" } ], "analyses": { "keywords": [ "sequence data", "inference algorithms", "pattern-based crfs", "efficient algorithm", "input patterns" ], "tags": [ "journal article" ], "publication": { "publisher": "Springer" }, "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable", "adsabs": "2012arXiv1210.0508T" } } }