Building Collections in IRs from External Data Sources

SOAR Repository

Show simple item record

dc.contributor.author Deng, Sai
dc.contributor.author Matveyeva, Susan J.
dc.date.accessioned 2012-10-01T18:42:49Z
dc.date.available 2012-10-01T18:42:49Z
dc.date.issued 2012-10
dc.identifier.other ORCID:0000-0003-0962-9465
dc.identifier.uri http://hdl.handle.net/10057/5329
dc.description Presented at the 15th Annual LITA National Forum, Columbus, Ohio, October 6, 2012. en_US
dc.description.abstract This presentation will address the experiments of building research publication collections from external data sources such as PubMed, IEEE Xplore and Web of Science in the DSpace based institutional repository (IR). To get data from other sources is an alternative way to develop collections for different disciplines since author self-deposit has not become a common practice for institutional repositories. This is also an effort in line with the current metadata cataloging trend of moving from item by item cataloging to batch processing of metadata, repurposing of metadata between different systems and communities, and providing value-added data services to students and faculty in an IR. It will discuss the options to batch transform, enhance and transfer over 720 student and faculty publications from Medline format in PubMed to Dublin Core (DC) in DSpace. In the PubMed-DSpace project, PubMed provided XML is mapped and transformed to DCXML, exported to and enhanced in Excel, divided to separate departmental collections and batch loaded to DSpace server. It will talk about project planning, workflow management and record prototype creation based on the user needs. It will cover technical details including selection of metadata fields, mapping of Medline to DC, name authority check, content enrichment such as adding more DOI and other links, descriptions, copyright information and the article peer-review status, data normalization, data accuracy and consistency check. It will discuss the implementation and customization of an add-on to facilitate DSpace data batch import. At the same time it will discuss the challenges in adding institutional research outcome u sing this new approach such as: the advantages and disadvantages of the different options to transform Medline to DC, data acquisition and content recruitment, metadata granularity and generality, selection of multiple subject types and identifiers, content enhancement, and copyright compliance. The cases of collecting data from IEEE Xplore and Web of Science, enhancing data in spreadsheets and batch load it to separate departmental collections in DSpace will be included. Other possibilities of adding data from external databases and the open web to the IR will also be discussed. en_US
dc.language.iso en_US en_US
dc.subject Data reuse en_US
dc.subject data curation en_US
dc.subject External sources en_US
dc.subject Bulk import en_US
dc.subject Metadata en_US
dc.subject Dublin Core metadata en_US
dc.title Building Collections in IRs from External Data Sources en_US
dc.type Presentation en_US

Files in this item

This item appears in the following Collection(s)

  • UL Faculty Research [36]
    This collection includes published research, preprints and presentations of Libraries faculty and academic staff.
  • Sai Deng [24]
    Metadata Librarian
  • Susan J. Matveyeva [35]
    Catalog and Institutional Repository Librarian

Show simple item record

Search SOAR

Advanced Search


My Account