• Login
    View Item 
    •   Shocker Open Access Repository Home
    • University Libraries
    • UL Faculty Research
    • View Item
    •   Shocker Open Access Repository Home
    • University Libraries
    • UL Faculty Research
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Automatic subject heading assignment for online government publications using a semi-supervised machine learning approach

    View/Open
    Peer reviewed conference paper (246.8Kb)
    Poster (472.3Kb)
    Date
    2006
    Author
    Hu, Xiao
    Jackson, Larry
    Deng, Sai
    Zhang, Jing
    Metadata
    Show full item record
    Citation
    Hu, X., Jackson, L., Deng, S. & Zhang, J. (2006). Automatic subject heading assignment for online government publications using a semi-supervised machine learning approach. In Proceedings of the American Society for Information Science and Technology. Volume 42, Issue 1, 2006.
    Abstract
    As the dramatic expansion of online publications continues, state libraries urgently need effective tools to organize and archive the huge number of government documents published online. Automatic text categorization techniques can be applied to classify documents approximately, given a sufficient number of labeled training examples. However, obtaining training labels is very expensive, requiring a lot of manual labor. We present a semi-supervised machine learning approach, an Expectation-Maximization (EM) algorithm text classifier, which makes use of easily obtained unlabeled documents and thus reduces the demand for labeled training examples. This paper describes the whole procedure of applying this approach to a real world online information preservation project where a collection is harvested from the websites of Illinois State Government agencies and a subject heading taxonomy is adapted from the State GILS topic tree. A formal evaluation has been performed based on the intended use of the assigned headings. The results demonstrate the semi-supervised approach improves subject heading assignment compared to the supervised approach, and is more efficient in using labeled documents.
    URI
    http://hdl.handle.net/10057/1251
    Collections
    • Sai Deng
    • UL Faculty Research

    Browse

    All of Shocker Open Access RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsBy TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsBy Type

    My Account

    LoginRegister

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    DSpace software copyright © 2002-2023  DuraSpace
    DSpace Express is a service operated by 
    Atmire NV