{ "id": "1212.2504", "version": "v1", "published": "2012-10-19T15:06:52.000Z", "updated": "2012-10-19T15:06:52.000Z", "title": "Efficiently Inducing Features of Conditional Random Fields", "authors": [ "Andrew McCallum" ], "comment": "Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)", "categories": [ "cs.LG", "stat.ML" ], "abstract": "Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of these models is their great flexibility to include a wide array of overlapping, multi-granularity, non-independent features of the input. In face of this freedom, an important question that remains is, what features should be used? This paper presents a feature induction method for CRFs. Founded on the principle of constructing only those feature conjunctions that significantly increase log-likelihood, the approach is based on that of Della Pietra et al [1997], but altered to work with conditional rather than joint probabilities, and with additional modifications for providing tractability specifically for a sequence model. In comparison with traditional approaches, automated feature induction offers both improved accuracy and more than an order of magnitude reduction in feature count; it enables the use of richer, higher-order Markov models, and offers more freedom to liberally guess about which atomic features may be relevant to a task. The induction method applies to linear-chain CRFs, as well as to more arbitrary CRF structures, also known as Relational Markov Networks [Taskar & Koller, 2002]. We present experimental results on a named entity extraction task.", "revisions": [ { "version": "v1", "updated": "2012-10-19T15:06:52.000Z" } ], "analyses": { "keywords": [ "conditional random fields", "efficiently inducing features", "arbitrary crf structures", "induction method applies", "higher-order markov models" ], "tags": [ "conference paper" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable", "adsabs": "2012arXiv1212.2504M" } } }