Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
oai_harvesting_via_marcedit [2016/01/08 13:17]
kdion created
oai_harvesting_via_marcedit [2022/05/16 19:35] (current)
jeustis
Line 1: Line 1:
-====== ​OAI Harvesting of Scholarworks Records Via MarcEdit ​======+====== ​PAGE OUTDATED ARCHIVED =======
  
-This document is a work in progress but puts in place the basics for harvesting the University'​s electronic dissertations,​ masters theses and MFA theses in Scholarworks via an OAI-PMH crosswalk using an XML script. ​+=== OAI Harvesting of Scholarworks Records Via MarcEdit ===
  
-  ​- Open MarcEdit. Make sure it is set to the SAXON.NET XSLT Engine due to the XSLT 2.0 being used in the XML file.  (Go to Tools --> Preferences --> MarcEngine) +This document puts in place the basics for harvesting the University'​s ETD dissertations,​ masters theses, MFA theses and LARP terminal projects in Scholarworks via an OAI-PMH crosswalk using an XML and XSLT script.  
-  ​- ​Click on '​Harvest OAI Records'​. ​ In the popup Metadata Harvester window, input: + 
-        Server Record: http://​scholarworks.umass.edu/​cgi/oai2.cgi +==To Harvest:​== 
-        Set Name:  publication:​masters_theses_2 ​ OR + 
-                   publication:​dissertations_2 ​ OR +Copy the OAIDCtoMARCXMLmodified.xsl file and Marc21slimUtils file from the R: drive (Theses -> OAI MarcEdit Crosswalk) to your own computer C: drive  
-                   ​englmfa_theses ​  ​OR + 
-                   larp_ms_projects +Open MarcEdit. Make sure it is set to the SAXON.NET XSLT Engine due to XSLT 2.0 being used in the XML file.  (Go to Tools --> Preferences --> MarcEngine) 
-        Metadata Type: Dublin Core + 
-        Crosswalk Path: C:\Temp\XML1\OAIDCtoMARCXMLmodified.xsl  ​*instructions below*+Click on '​Harvest OAI Records'​. ​ In the popup Metadata Harvester window, input: 
 +  ​* ​Server Record: ​<​nowiki>​http://​scholarworks.umass.edu/​do/oai</​nowiki>​ 
 +  ​* ​Set Name: publication:​masters_theses_2 ​//OR// publication:​dissertations_2 ​//OR// 
 +englmfa_theses ​//OR// larp_ms_projects 
 +  ​* ​Metadata Type: dcq 
 +  ​* ​Crosswalk Path: R\Theses\MarcEdit_Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl This file can also be copied to your C: Drive. ​
         ​         ​
-  - Click on Advanced Settings  + Note: this metadata type the Qualifed Dublin Core records. Simple Dublin Core will not allow us to extract degree names nor departments. ​        
-  ​- ​Add the Start and End date. Must follow the format of yyyy-mm-dd - for example, 2015-02-01 / 2015-05-31. ​ This will harvest any new files uploaded to Scholarworks in that time period (i.e., February ETDs). You may have to tinker with the dates to capture all the files desired.  +         
-  ​- ​Click on OK.  Harvesting will commence and filter through the C: drive .xsl file. The results will be displayed a MarcEditor window.  +Click on Advanced Settings
-  ​- ​Compare the list of names against the '​packing list' spreadsheet provided by the Graduate School. ​ There may be ETDs with earlier publication dates which already have in-house cataloged records in OCLC and Aleph. Delete any records which would generate duplicate bib records. ​+  
 +Add the Start and End date. Must follow the format of yyyy-mm-dd - for example, 2015-02-01 / 2015-05-31. ​ This will harvest any new files uploaded to Scholarworks in that time period (i.e., February ETDs). You may have to tinker with the dates to capture all the files desired. 
 +           
 +Click on OK.  Harvesting will commence and filter through the C: drive .xsl file. The results will be displayed a MarcEditor window. 
 +    
 +Compare the list of names against the '​packing list' spreadsheet provided by the Graduate School.  ​(This is easier to do if you alphabetically sort the MarcEditor list via Tools -> Sort by.) There may be eTDs with earlier publication dates which already have in-house cataloged records in OCLC and Aleph. Delete any records which would generate duplicate bib records.  There may also be Master of Fine Arts eTDs in the packing list. These will be in the englmfa_masters set and will need to be harvested separately. Occasionally there are names on the packing list which are not harvested at all. The cause of these will need to be investigated separately. 
 +   
 +==Utilizing the MarcEdit Task List== ​  
 + 
 +**Important!** The Task List will not pick up the appropriate date needed for the Fixed Field. This will have to be changed by hand.  Example: if the harvested records are all from a 2016 packing list, then change the Task List (Example: 008,  151111s2015 to 151111s2016). Otherwise, change by hand in MarcEditor after running the Task List.
       ​       ​
-  - In the menu bar of the MarcEditor file, click on Tools --> Assigned Tasks --> then click on one of the following as appropriate:​ +In the menu bar of the MarcEditor file, click on Tools --> Assigned Tasks --> then click on one of the following as appropriate:​ 
-      OAI_Dissertations +  ​* ​OAI_Dissertations 
-      OAI_Masters +  ​* ​OAI_Masters 
-      OAI_MFA +  ​* ​OAI_MFA 
-      OAI_LARP +  ​* ​OAI_LARP 
-     ​This will run the harvested records through the MarcEdit task list. Save the results to your hard drive as a .mrk file (ex: C:\Temp\OAI_Batch\MastersFeb2015.mrk)+ 
 +This will run the harvested records through the MarcEdit task list. Save the results to your hard drive as a .mrk file (ex: C:\Crosswalk\Converted\OAIMastersFeb2015.mrk) 
 + 
 +A breakdown of each task run in the Task List can be found in [[dissertations_marcedit_task_list|MarcEdit Task List for Dissertations and Theses]] 
 + 
 +==Converting all lower case or all CAPS to Upper and lower== 
 + 
 +There are times an author'​s name is in all lower case or CAPS, and/or a title is in ALL CAP. The easiest way to rectify this is in MarcEditor via Edit --> Edit Shortcuts --> Change Case --> Capitalize Initial character.  
 + 
 +==Checking for funky names== 
 + 
 +The Task List unfortunately cannot catch every single error produced in a personal name after it has been inverted. In particular, watch out for names with qualifiers (William, $q(Bill)) and names already inverted in Scholarworks (Morgan, Michael becomes MichaelMorgan). ​
            
-  - Checking for bad characters +==Checking for Bad Characters== 
-      ​This ​will scan the .mrk file for non-ASCII characters which would otherwise prevent a record from being validated and uploading into OCLC. + 
-      NOTE: At this time (1/8/2016), the XML file crosswalk has not been modified to successfully run an Unicode-UT8 fix templateThe following instructions are for correcting each record by hand in Connexion. ​+The XSLT crosswalk script ​will automatically convert up any non-comforming punctuation ​(single left and right quotation marksleft and right double quotation marks, En dash, Em dash) and for the most part can also handle diatriticsBut for the odd stray, the following instructions are for correcting each record by hand in Connexion. ​
       ​       ​
-      - Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file.  +Open the MRK_BadCharRdr application on your desktop (available from Systems). This will open the directory in your C: drive to which you previously saved the above .mrk file.  
-      ​- ​Select and open the file. (The file folder type is Mnemonic MarcEditor File) +  
-      - The script will then run through the file and save the results in an Exel file under the same filename in the same C: directory+Select and open the file. (The file folder type is Mnemonic MarcEditor File)The script will then run through the file and save the results in an Exel file under the same filename in the same C: directory.
-      - Each record with a bad character is listed by number and shows the MARC field involved as well as the codes for each bad character. ​ Set this list aside+
  
-- Import harvested records into your C: drive+Each record with a bad character is listed by number and shows the MARC field involved as well as the codes for each bad character. ​ Set this list aside. ​
  
-      - Click on the Marc Tools button and input: 
-      - Input file: .mrk filename as above (C:​\Temp\OAI_Batch\MastersFeb2015.mrk) 
-      - Output file: change file type to .mm (C:​\Temp\OAI_Batch\MastersFeb2015.mm) 
-      - Click on MarcMaker 
-      - Execute 
-      - The results will show the number of records imported.  ​ 
-      - Close window 
  
-Import ​file into Connexion +==Import ​Harvested Records ​into your C: Drive==
-     - Open File --> Local File Manager. ​ Create a directory for the file to import into (i.e., Theses\February2015Masters) and set this as the default.  +
-     - Open Import Records and input the .mm file to import from your C: drive. Make sure the button for Import to Local Save File is selected.  +
-     - Check that the Bibliographic destination is correct (Theses\February2015Masters.bib.db) +
-     - Click on OK +
-     - Open the file.+
  
-Import file into OCLC +Click on the Marc Tools button and input: 
-       ​(Coming soon!)+ 
 +__Input file__: .mrk filename as above (C:​\Crosswalk\Converted\MastersFeb2015.mrk) 
 +__Output file__: change file type to .mm (C:​\Crosswalk\Converted\MastersFeb2015.mm) 
 +         
 +Select MarcMaker 
 +Click on Execute 
 + 
 +The results will show the number of records imported. Close window. 
 + 
 +==Import file into Connexion== 
 +   
 +Open File --> Local File Manager. ​ Create a directory for the file to import into (i.e., Theses\February2015Mastersand set this as the default.  
 +   
 +Open Import Records and input the .mm file to import from your C: drive. Make sure the button for Import to Local Save File is selected.  
 +      
 +Check that the Bibliographic destination is correct (i.e., Theses\February2015Masters.bib.db) 
 +   
 +Click on OK. Open the file.  If necessary, do a spot check and any needed corrections.  
 +   
 + 
 +==Import files into OCLC== 
 + 
 +See [[batch_uploading_oais_to_oclc_and_aleph|Batch Uploading OAIs from Scholarworks into OCLC and Aleph]]
                
  
-NOTES:+==NOTES:==
  
-    - The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is.   This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\OAI MarcEdit XML harvest code (OAIDCtoMARCXMLmodified.xsl). Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly.+The original MarcEdit OAIDCtoMarcXML file can be found on your hard drive under C:\Program Files\MarcEdit 6\xslt\OAIDCtoMARCXML.xsl or wherever your MarcEdit application version is.   This is the XML generic version .. don't change this; use the modified version, a copy of which can be found in the R drive under Theses\MarcEdit_Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl. Note that you must also have the Marc21slimUtils in the same folder in order for the .xsl file to run properly.
  
-     - The XML script is based on that generously shared by Ken Robinson (kjr106@psu.edu),​ Cataloging and Metadata Services, the Pennsylvania State University. ​ This file can be found online at [[https://​scholarsphere.psu.edu/​collections/​x346dj68d]] along with a detailed description of their eTD Dublin Core-to-MARCXML Crosswalk. ​   The script includes a template to check for bad non-ASCII characters but as of this writing, it will not run in our XML script. This is being worked on.  
  
-     - Our XML script ​version ​does the following:​ +The XSLT script is based on that generously shared by Ken Robinson (kjr106@psu.edu),​ Cataloging and Metadata Services, the Pennsylvania State University. ​ This file can be found online at [[https://​scholarsphere.psu.edu/​collections/​x346dj68d]] along with a detailed description of their eTD Dublin Core-to-MARCXML Crosswalk. ​   
-         ​Modifies the 006 and 007 fields + 
-         ​Inserts 040, 042 fields + 
-         ​Changes the 245 00 indicator fields to 10. Later versions of the script will allow changes in the second indicator according ​to any articles present. This is currently taken care of by the MarcEdit Task List. + 
-         ​Changes the 700 '​creator'​ field to a 100 '​author'​ field with the appropriate |e subfield. +Our personalized __XML script ​version__ ​does the following:​ 
-         ​Inserts a 264 field (Amherst, Massachusetts :|b University of Massachusetts Amherst, |c <​appropriate date as harvested>​. +  ​* ​Modifies the 006 and 007 fields 
-         ​Inserts a 300 field (1 electronic document.+  ​* ​Inserts 040, 042 fields 
-         ​Inserts the RDA fields 336, 337, 338 and 347. +  ​* ​Changes the 245 00 indicator fields to 10.  
-         ​Later versions will insert the appropriate ​degree ​name harvested from each record (Ph.D. |c University of Massachusetts Amherst |d <​date>​). Currently this is handled by the MarcEdit Task List+  * Corrects ​the 245 field to show the appropriate indicators for a title beginning with an article 
-         ​Inserts a 538 field (Available online in PDF format via Scholarworks at UMass Amherst.) +  ​* ​Changes the 700 '​creator'​ field to a 100 '​author'​ field with the appropriate |e subfield. 
-         ​Inserts 653 fields for keywords and such. +  ​* ​Inserts a 264 field (Amherst, Massachusetts :|b University of Massachusetts Amherst, |c <​appropriate date as harvested>​. 
-         ​Inserts a 655_7 field (Academic theses. |2 lcgft) +  ​* ​Inserts a 300 field (1 online resource
-         Later versions will include ​a 690 field with the degree program harvested from each record ​(Theses |x Chemistry |x Masters) +  ​* ​Inserts the RDA fields 336, 337, 338 and 347.          
-         ​Inserts a 710 field (University of Massachusetts Amherst, |e degree granting institution) +  * Inserts a 502 field (<degree ​abbrev.|c University of Massachusetts Amherst |d <​date>​).  
-         ​Inserts a 710 field (University of Massachusetts Amherst. Libraries, |e issuing body) +  ​* ​Inserts a 538 field (Available online in PDF format via Scholarworks at UMass Amherst.) 
-         ​Inserts a 856 field (Scholarworks URL with |z Link to free resource)+  ​* ​Inserts 653 fields for keywords and such. 
 +  ​* ​Inserts a 655_7 field (Academic theses. |2 lcgft) 
 +  * Inserts ​a 690 field (Theses |x Chemistry |x Masters)  *NOTE:* The crosswalk script automatically adds |x Masters but this will be changed to Doctoral as needed via MarcEdit Tools.
 +  * Inserts 700 fields for advisors  
 +  * Inserts a 710 field (University of Massachusetts Amherst, |e degree granting institution) 
 +  ​* ​Inserts a 710 field (University of Massachusetts Amherst. Libraries, |e issuing body) 
 +  ​* ​Inserts a 856 field (Scholarworks URL with |z Link to free resource)
                    
-- The MarcEdit Task List does the following: 
-         Adds an 008 field and corrects any necessary LDR fields 
-         Adds an 049 AUMM field 
-         ​Corrects the 100 field to include a period and comma after an initial in the author'​s name 
-         ​Corrects the 245 field to show the appropriate indicators for a title beginning with an article 
-         ​Inserts a colon and |b where needed 
-         Adds a 502 field (Masters Degree |c University of Massachusetts Amherst |d <​date>​ 
-         ​Coming:​ adding a 949 field for ALEPH holdings purposes 
-         NOTE: Each MarcEdit task list has its own 502 field for Masters Degree, Doctoral Degree and Terminal Project Degree as well as its own 690 field for Doctoral and Masters. ​ 
                    
 +
 +Our personalized __MarcEdit Task List__ does the following:
 +  * Adds an 008 field and corrects any necessary LDR fields.
 +  * Adds an 049 AUMM field.
 +  * Corrects the 100 field to include a period and comma after an initial ​
 +      in the author'​s name.
 +  * Corrects spacing issues in the 100 field. ​     ​
 +  * Inserts a colon and |b where needed in the 245 field.
 +  * Removes titles (Dr., Prof.) and '​Ph.D'​ from advisor names.
 +  * Inserts subfield $c in advisor names which includes titles (Jr., Sr., II, etc).
 +  * Reverses the form of advisor names to Lastname, Firstname and replaces
 +         |e contributor with |e advisor. ​
 +  * Cleans up any problems resulting from 700 field inversions.
 +  * Strips unwanted HTML tags from the 520 field. ​      
 +  * Cleans up any goofy stuff (i.e., Plant & Soil Sciences to Plant and Soil Sciences)
 +         
 +**ADDENDUM** ​        
 +
 +The following URL brings up an XML document tree from BePress which shows the Dublin Core metadata used and the name for each element used (i.e., dc:creator) for each.  You can check on other mappings by substituting dissertations_2,​ englmfa_theses,​ or larp_ms_projects ​ for masters_theses_2 at the end of the URL. 
 +
 +[[http://​scholarworks.umass.edu/​do/​oai/?​verb=ListRecords&​metadataPrefix=dcq&​set=publication:​masters_theses_2]]
 +
    
-      ​+-- //Contact person: [[kdion@library.umass.edu|Kay Dion]] or [[mbergin@library.umass.edu|Meghan Bergin]]// ​     ​
  
       ​       ​
                                        
  
oai_harvesting_via_marcedit.1452259025.txt.gz · Last modified: 2019/01/07 17:20 (external edit)
[unknown link type]Back to top
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0