Abstract:
Categorization by Reference is a novel text classification technique that examines the existing classifications of the citations found in an as-yet unclassified text to determine what terms should be assigned to that text. The existence of the Medical Subject Headings and MEDLINE make the biomedical domain a prime candidate for application of this technique. We describe our approach and implementation of a prototype, presenting some results of our initial tests. We further discuss refinements that could improve the precision of the technique, and describe its possible use in categorizing portions of the World-Wide Web.