Show simple item record

dc.contributor.authorHu, Xiao
dc.contributor.authorJackson, Larry
dc.contributor.authorDeng, Sai
dc.contributor.authorZhang, Jing
dc.date.accessioned2008-04-10T18:41:17Z
dc.date.available2008-04-10T18:41:17Z
dc.date.issued2006
dc.identifier.citationHu, X., Jackson, L., Deng, S. & Zhang, J. (2006). Automatic subject heading assignment for online government publications using a semi-supervised machine learning approach. In Proceedings of the American Society for Information Science and Technology. Volume 42, Issue 1, 2006.en
dc.identifier.urihttp://hdl.handle.net/10057/1251
dc.description.abstractAs the dramatic expansion of online publications continues, state libraries urgently need effective tools to organize and archive the huge number of government documents published online. Automatic text categorization techniques can be applied to classify documents approximately, given a sufficient number of labeled training examples. However, obtaining training labels is very expensive, requiring a lot of manual labor. We present a semi-supervised machine learning approach, an Expectation-Maximization (EM) algorithm text classifier, which makes use of easily obtained unlabeled documents and thus reduces the demand for labeled training examples. This paper describes the whole procedure of applying this approach to a real world online information preservation project where a collection is harvested from the websites of Illinois State Government agencies and a subject heading taxonomy is adapted from the State GILS topic tree. A formal evaluation has been performed based on the intended use of the assigned headings. The results demonstrate the semi-supervised approach improves subject heading assignment compared to the supervised approach, and is more efficient in using labeled documents.en
dc.format.extent159817 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_USen
dc.publisherAmerican Society for Information Science and Technologyen
dc.subjectGovernment publicationsen
dc.subjectAutomatic categorizationen
dc.subjectGovernment informationen
dc.subjectSubject headingsen
dc.subjectGovernment librariesen
dc.subjectState library agenciesen
dc.titleAutomatic subject heading assignment for online government publications using a semi-supervised machine learning approachen
dc.typeConference paperen
dc.description.versionPeer reviewed


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

  • UL Faculty Research [37]
    This collection includes published research, preprints and presentations of Libraries faculty and academic staff.
  • Sai Deng [24]
    Metadata Librarian

Show simple item record