This is an old revision of the document!


OAI Harvesting of Scholarworks Records Via MarcEdit

This document is a work in progress but puts in place the basics for harvesting the University's ETD dissertations, masters theses,MFA theses and LARP terminal projects in Scholarworks via an OAI-PMH crosswalk using an XML and XSLT script.

To Harvest:

Open MarcEdit. Make sure it is set to the SAXON.NET XSLT Engine due to XSLT 2.0 being used in the XML file. (Go to Tools –> Preferences –> MarcEngine)

Click on 'Harvest OAI Records'. In the popup Metadata Harvester window, input:

  • Server Record: http://scholarworks.umass.edu/do/oai
  • Set Name: publication:masters_theses_2 OR publication:dissertations_2 OR

englmfa_theses OR larp_ms_projects

  • Metadata Type: qdc
  • Crosswalk Path: C:\Temp\XML1\OAIDCtoMARCXMLmodified.xsl

(This is for the Qualifed Dublin Core records. Simple Dublin Core will not allow us to extract degree names nor departments.)

Click on Advanced Settings.

Add the Start and End date. Must follow the format of yyyy-mm-dd - for example, 2015-02-01 / 2015-05-31. This will harvest any new files uploaded to Scholarworks in that time period (i.e., February ETDs). You may have to tinker with the dates to capture all the files desired.

Click on OK. Harvesting will commence and filter through the C: drive .xsl file. The results will be displayed a MarcEditor window.

Compare the list of names against the 'packing list' spreadsheet provided by the Graduate School. There may be ETDs with earlier publication dates which already have in-house cataloged records in OCLC and Aleph. Delete any records which would generate duplicate bib records.

Utilizing the MarcEdit Task List

In the menu bar of the MarcEditor file, click on Tools –> Assigned Tasks –> then click on one of the following as appropriate:

  • OAI_Dissertations
  • OAI_Masters
  • OAI_MFA
  • OAI_LARP

This will run the harvested records through the MarcEdit task list. Save the results to your hard drive as a .mrk file (ex: C:\Temp\OAI_Batch\MastersFeb2015.mrk)

Checking for Bad Characters

The XSLT crosswalk script will automatically convert up any non-comforming punctuation (single left and right quotation marks, left and right double quotation marks, En dash, Em dash) but at this time (3/9/2016) it cannot covert bad diacritics. The following instructions are for correcting each record by hand in Connexion.

Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file.

Select and open the file. (The file folder type is Mnemonic MarcEditor File). The script will then run through the file and save the results in an Exel file under the same filename in the same C: directory.

Each record with a bad character is listed by number and shows the MARC field involved as well as the codes for each bad character. Set this list aside.

Import Harvested Records into your C: Drive

Click on the Marc Tools button and input:

Input file: .mrk filename as above (C:\Temp\OAI_Batch\MastersFeb2015.mrk) Output file: change file type to .mm (C:\Temp\OAI_Batch\MastersFeb2015.mm)

Select MarcMaker Click on Execute

The results will show the number of records imported. Close window.

Import file into Connexion

Open File –> Local File Manager. Create a directory for the file to import into (i.e., Theses\February2015Masters) and set this as the default.

Open Import Records and input the .mm file to import from your C: drive. Make sure the button for Import to Local Save File is selected.

Check that the Bibliographic destination is correct (i.e., Theses\February2015Masters.bib.db)

Click on OK. Open the file. If necessary, do a spot check and any needed corrections.

Import files into OCLC

(Coming soon!)

NOTES:

The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is. This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\OAI MarcEdit XML harvest code (OAIDCtoMARCXMLmodified.xsl). Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly.

The XML script is based on that generously shared by Ken Robinson (kjr106@psu.edu), Cataloging and Metadata Services, the Pennsylvania State University. This file can be found online at https://scholarsphere.psu.edu/collections/x346dj68d along with a detailed description of their eTD Dublin Core-to-MARCXML Crosswalk.

Our personalized XML script version does the following:

  • Modifies the 006 and 007 fields
  • Inserts 040, 042 fields
  • Changes the 245 00 indicator fields to 10.
  • Corrects the 245 field to show the appropriate indicators for a title beginning with an article
  • Changes the 700 'creator' field to a 100 'author' field with the appropriate |e subfield.
  • Inserts a 264 field (Amherst, Massachusetts :|b University of Massachusetts Amherst, |c <appropriate date as harvested>.
  • Inserts a 300 field (1 online resource)
  • Inserts the RDA fields 336, 337, 338 and 347.
  • Inserts a 502 field (<degree abbrev.> |c University of Massachusetts Amherst |d <date>).
  • Inserts a 538 field (Available online in PDF format via Scholarworks at UMass Amherst.)
  • Inserts 653 fields for keywords and such.
  • Inserts a 655_7 field (Academic theses. |2 lcgft)
  • Inserts a 690 field (Theses |x Chemistry |x Masters) *NOTE:* The crosswalk script automatically adds x Masters but this will be changed to Doctoral as needed via MarcEdit Tools.)
  • Inserts 700 fields for advisors
  • Inserts a 710 field (University of Massachusetts Amherst, |e degree granting institution)
  • Inserts a 710 field (University of Massachusetts Amherst. Libraries, |e issuing body)
  • Inserts a 856 field (Scholarworks URL with |z Link to free resource)

The MarcEdit Task List does the following:

  * Adds an 008 field and corrects any necessary LDR fields
  * Adds an 049 AUMM field
  * Corrects the 100 field to include a period and comma after an initial in the author's name
  * Inserts a colon and |b where needed
  * Removes titles (Dr., Prof.) and 'Ph.D' from advisor names
  * Reverses the form of advisor names to Lastname, Firstname and replaces |e contributor with |e advisor. 
  • Coming: adding a 949 field for ALEPH holdings purposes

Contact person: Kay Dion or Meghan Bergin

oai_harvesting_via_marcedit.1457538444.txt.gz · Last modified: 2019/01/07 17:20 (external edit)
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0