Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
oai_harvesting_via_marcedit [2016/03/31 17:32]
kdion Addendum
oai_harvesting_via_marcedit [2022/05/16 19:35] (current)
jeustis
Line 1: Line 1:
-====== ​OAI Harvesting of Scholarworks Records Via MarcEdit ​======+====== ​PAGE OUTDATED ARCHIVED =======
  
-This document ​is a work in progress but puts in place the basics for harvesting the University'​s ETD dissertations,​ masters theses,MFA theses and LARP terminal projects in Scholarworks via an OAI-PMH crosswalk using an XML and XSLT script. ​+=== OAI Harvesting of Scholarworks Records Via MarcEdit === 
 + 
 +This document puts in place the basics for harvesting the University'​s ETD dissertations,​ masters theses, MFA theses and LARP terminal projects in Scholarworks via an OAI-PMH crosswalk using an XML and XSLT script. ​
  
 ==To Harvest:== ==To Harvest:==
Line 14: Line 16:
 englmfa_theses //OR// larp_ms_projects englmfa_theses //OR// larp_ms_projects
   * Metadata Type: dcq   * Metadata Type: dcq
-  * Crosswalk Path: wherever it is on your C: drive (ex. C:Program Files\Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl +  * Crosswalk Path: R\Theses\MarcEdit_Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl This file can also be copied to your C: Drive. ​
         ​         ​
- (This is for the Qualifed Dublin Core records. Simple Dublin Core will not allow us to extract degree names nor departments.       + Note: this metadata type the Qualifed Dublin Core records. Simple Dublin Core will not allow us to extract degree names nor departments. ​       ​
         ​         ​
 Click on Advanced Settings. Click on Advanced Settings.
Line 24: Line 26:
 Click on OK.  Harvesting will commence and filter through the C: drive .xsl file. The results will be displayed a MarcEditor window. Click on OK.  Harvesting will commence and filter through the C: drive .xsl file. The results will be displayed a MarcEditor window.
        
-Compare the list of names against the '​packing list' spreadsheet provided by the Graduate School. ​ (This is easier to do if you alphabetically sort the MarcEditor list via Tools -> Sort by.) There may be ETDs with earlier publication dates which already have in-house cataloged records in OCLC and Aleph. Delete any records which would generate duplicate bib records. ​ There may also be Master of Fine Arts ETDs in the packing list. These will be in the englmfa_masters set and will need to be harvested separately. Occasionally there are names on the packing list which are not harvested at all. The cause of these will need to be investigated separately.+Compare the list of names against the '​packing list' spreadsheet provided by the Graduate School. ​ (This is easier to do if you alphabetically sort the MarcEditor list via Tools -> Sort by.) There may be eTDs with earlier publication dates which already have in-house cataloged records in OCLC and Aleph. Delete any records which would generate duplicate bib records. ​ There may also be Master of Fine Arts eTDs in the packing list. These will be in the englmfa_masters set and will need to be harvested separately. Occasionally there are names on the packing list which are not harvested at all. The cause of these will need to be investigated separately.
   ​   ​
 ==Utilizing the MarcEdit Task List==  ​ ==Utilizing the MarcEdit Task List==  ​
 +
 +**Important!** The Task List will not pick up the appropriate date needed for the Fixed Field. This will have to be changed by hand.  Example: if the harvested records are all from a 2016 packing list, then change the Task List (Example: 008,  151111s2015 to 151111s2016). Otherwise, change by hand in MarcEditor after running the Task List.
       ​       ​
 In the menu bar of the MarcEditor file, click on Tools --> Assigned Tasks --> then click on one of the following as appropriate:​ In the menu bar of the MarcEditor file, click on Tools --> Assigned Tasks --> then click on one of the following as appropriate:​
Line 34: Line 38:
   * OAI_LARP   * OAI_LARP
  
-This will run the harvested records through the MarcEdit task list. Save the results to your hard drive as a .mrk file (ex: C:​\Crosswalk\Temp\OAIMastersFeb2015.mrk)+This will run the harvested records through the MarcEdit task list. Save the results to your hard drive as a .mrk file (ex: C:​\Crosswalk\Converted\OAIMastersFeb2015.mrk) 
 + 
 +A breakdown of each task run in the Task List can be found in [[dissertations_marcedit_task_list|MarcEdit Task List for Dissertations and Theses]] 
 + 
 +==Converting all lower case or all CAPS to Upper and lower== 
 + 
 +There are times an author'​s name is in all lower case or CAPS, and/or a title is in ALL CAP. The easiest way to rectify this is in MarcEditor via Edit --> Edit Shortcuts --> Change Case --> Capitalize Initial character.  
 + 
 +==Checking for funky names== 
 + 
 +The Task List unfortunately cannot catch every single error produced in a personal name after it has been inverted. In particular, watch out for names with qualifiers (William, $q(Bill)) and names already inverted in Scholarworks (Morgan, Michael becomes MichaelMorgan). ​
            
 ==Checking for Bad Characters== ==Checking for Bad Characters==
  
-The XSLT crosswalk script will automatically convert up any non-comforming punctuation (single left and right quotation marks, left and right double quotation marks, En dash, Em dash) but at this time (3/9/2016) it cannot covert bad diacritics ​The ​following instructions are for correcting each record by hand in Connexion. ​+The XSLT crosswalk script will automatically convert up any non-comforming punctuation (single left and right quotation marks, left and right double quotation marks, En dash, Em dash) and for the most part can also handle diatriticsBut for the odd stray, the following instructions are for correcting each record by hand in Connexion. ​
       ​       ​
 Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file.  Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file. 
Line 45: Line 59:
  
 Each record with a bad character is listed by number and shows the MARC field involved as well as the codes for each bad character. ​ Set this list aside. ​ Each record with a bad character is listed by number and shows the MARC field involved as well as the codes for each bad character. ​ Set this list aside. ​
 +
  
 ==Import Harvested Records into your C: Drive== ==Import Harvested Records into your C: Drive==
Line 50: Line 65:
 Click on the Marc Tools button and input: Click on the Marc Tools button and input:
  
-__Input file__: .mrk filename as above (C:​\Crosswalk\Temp\MastersFeb2015.mrk) +__Input file__: .mrk filename as above (C:​\Crosswalk\Converted\MastersFeb2015.mrk) 
-__Output file__: change file type to .mm (C:​\Crosswalk\Temp\MastersFeb2015.mm)+__Output file__: change file type to .mm (C:​\Crosswalk\Converted\MastersFeb2015.mm)
         ​         ​
 Select MarcMaker Select MarcMaker
Line 76: Line 91:
 ==NOTES:== ==NOTES:==
  
-The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is.   This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\OAI MarcEdit XML harvest code (OAIDCtoMARCXMLmodified.xsl). Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly.+The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is.   This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\MarcEdit_Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl. Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly.
  
  
Line 105: Line 120:
  
 Our personalized __MarcEdit Task List__ does the following: Our personalized __MarcEdit Task List__ does the following:
-  ​ +  * Adds an 008 field and corrects any necessary LDR fields. 
-    ​* Adds an 008 field and corrects any necessary LDR fields. +  * Adds an 049 AUMM field. 
-    * Adds an 049 AUMM field. +  * Corrects the 100 field to include a period and comma after an initial  
-    * Corrects the 100 field to include a period and comma after an initial  +      in the author'​s name. 
-        in the author'​s name. +  * Corrects spacing issues in the 100 field. ​      
-    * Inserts a colon and |b where needed +  ​* Inserts a colon and |b where needed ​in the 245 field. 
-    * Removes titles (Dr., Prof.) and '​Ph.D'​ from advisor names. +  * Removes titles (Dr., Prof.) and '​Ph.D'​ from advisor names. 
-    * Reverses the form of advisor names to Lastname, Firstname and replaces +  * Inserts subfield $c in advisor names which includes titles (Jr., Sr., II, etc). 
-           ​|e contributor with |e advisor.  +  ​* Reverses the form of advisor names to Lastname, Firstname and replaces 
-    * Strips unwanted HTML tags from the 520 field. ​       +         ​|e contributor with |e advisor.  
-    * Cleans up any goofy stuff (i.e., Plant & Soil Sciences to Plant and Soil Sciences) +  * Cleans up any problems resulting from 700 field inversions. 
-    * Coming: adding a 949 field for ALEPH holdings purposes +  ​* Strips unwanted HTML tags from the 520 field. ​       
- +  * Cleans up any goofy stuff (i.e., Plant & Soil Sciences to Plant and Soil Sciences)
                    
 **ADDENDUM** ​         **ADDENDUM** ​        
oai_harvesting_via_marcedit.1459445565.txt.gz · Last modified: 2019/01/07 17:20 (external edit)
[unknown link type]Back to top
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0