This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
batch_uploading_oais_to_oclc_and_aleph [2019/01/07 12:20] external edit
batch_uploading_oais_to_oclc_and_aleph [2020/05/01 18:15] (current)
Line 1: Line 1:
-====== Batch Uploading OAIs from Scholarworks into OCLC and Aleph ======+===== Batch Uploading OAIs from Scholarworks into OCLC and Aleph ===== 
 +==== CHANGE TITLE? ETDs (Current) Processing ScholarWorks OAIs ==== 
-Once our dissertations and theses are OAI harvested from Digital Commons (BePress) via MarcEdit then run through MarcEdit'​s Task list before being uploaded to Connexion, the bib records are ready to be batch uploaded to OCLC then batch exported to Aleph+NOTE:  My helpful "​hints"​ will appear in //Italics.//
-**To upload from Connexion ​to OCLC:**+===Introduction=== ​  
 +The Graduate School will email "​Packing Lists" dated February, May and September (end of semesters) of new dissertations,​ theses, MFA theses and occasionally LARP theses. ​ There may be a lag between these dates and when the ETDs are available on ScholarWorks. ​ //I try to process them after a couple of months have passed, to assure that they will be picked up in the Crosswalk harvest.//
-After importing the bib records file from MarcEdit ​(see [[http://www.library.umass.edu/​wikis/​acp/​doku.php?​id=oai_harvesting_via_marcedit|OAI Harvesting Via MarcEdit]] ):+===Preparation=== 
 +  * Have a handy copy (either online or a printout) of the Packing List-in-process. **NOTE:** It's a good idea to save copies of these in appropriate folders ​Example:​ **PackingListReport_Feb2019diss.xlsx** in [Drive]:​\OAI\Dissertations\2019\ (i.e2019), or OAI\Theses, ThesesMFA or ThesesLARP. 
 +  * Open MarcEdit.  (**NOTE:** Make sure your MarcEdit XSLT engine is set to SAXON.NET. On MarcEdit home page, click tools(found on top), Preferences,​ MARCEngine, select SAXON/NET under XSLT Engine.)
-(For example purposeswe will use the Connexion file for February 2016 Dissertations which can be opened via CatalogingSearchLocalSaveFile ​-> <nowiki>T:\\oclcapps\Connexion\Theses\2016_Feb_Dissertations.bib.db)</nowiki>)+===Harvesting from ScholarWorks=== 
 +  - **Click on Harvest OAI Records:​** ​(Found on either the MarcEdit home page or under (top) tools/OAI Harvester Tools/) Set the following:​ 
 +    * Server address: https://​scholarworks.umass.edu/​do/​oai/​ 
 +    * Set name (for dissertations): ​ publication:​dissertations_harvesting ​ (**IMPORTANT NOTE:  Because of software changes made in 2018Erin Jerome needs to be informed before running a Crosswalk on __dissertations only__! Before they can be pulled, they need to be transferred from "​publication:​dissertations_2"​ to a special harvesting subset.**) 
 +    * Set name (for theses): publication:​masters_theses_2 
 +    * Set name (for MFAs): publication:​englmfa_theses ​ (**NOTE:** This series is only for English MFAs; MFAs for art etc. are included in masters_theses_2.) 
 +    * Set name (for LARPs): publication:​ 
 +    * Metadata type: dcq  (**NOTE:** This is not included in the MarcEdit drop-down, but needs to be typed in. It's a "​modified"​ version of Dublin Core.) 
 +    * Crosswalk path:  C:​\Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl (**NOTE:** This program needs to be loaded onto your personal C: drive.) 
 +    * Start date (for May, in this format): 2019-06-01 
 +    * End date (for May, in this format): 2019-08-31 (**NOTE:** Using August avoids Sept. lists. Occasionally these dates have to be tweaked to include everything on the appropriate Packing List.) 
 +    * Hit "​OK"​ and let it run.  A green bar will appear if it is working. ​ (**NOTE:** This function is a little cranky. //Recently it didn't work for me because I entered 2019-11-31 instead of 2019-11-30.// ​ Everything has to be entered **precisely**! If no amount of tweaking resolves ​the issue, contact bepress (Digital Commons), which occasionally blocks ScholarWorks harvesting ​for security purposes, Erin Jerome or Aaron Rubinstein.) 
 +    * Once the harvesting is finished, a MarcEdit list will open up, containing the harvested records in raw form.  //Hint: Save immediately into the appropriate OAI folder, as (example) **umdissertations_sept.mrk**//​ (**NOTE:** When working in MarcEdit, click File/Save after **every change**!! ​ Do NOT Save if no changes are made.) 
 +  - **Check harvested records against Grad School'​s packing list** In MarcEdit, click Edit/​Find/​enter =100 in "Find what" window/​click Find All. This will produce a list that can be saved to the clipboard, and output to Excel. //Hint: To make Excel data more manageable, insert a blank column in front of the Jump to Record column, and work a **Data/Text to Columns** on the name column, splitting off and deleting the equal sign.// 
 +    * **IMPORTANT NEW STEP, added 2020:**Go to ScholarWorks/​Dissertations ​and Theses and log onto "My account",​ scroll down to the appropriate series (i.e., DOCTORAL DISSERTATIONS (dissertations_2)/​Manage Dissertations/​Batch revise Excel/​Generate a spreadsheet of current data. See [[changing_one_year_campus_access_titles_to_open_access|Changing one year campus titles to open access in ScholarWorks]] for instructions on generating ScholarWorks spreadsheets. If extra names appear in the MarcEdit file, check the generated spreadsheet to make sure they are NOT dated in the range requested. //(This step has been added since occasionally a dissertation or thesis will have been left off the Packing List.)// 
 +    * Any harvested record NOT on the Packing List that is also not on the generated spreadsheet,​ or has a different date (Check **degree_year** and **award_month**),​ or which belongs to a different series (such as English MFAs)can be removed from the MarcEdit file. 
 +===Edit the MarcEdit file of harvested records=== 
 +  - **Run MarcEdit task** 
 +    * Change date in 008 with the new year, under Tools -> Manage Tasks -Selected desired task in Task Lists window -> Manage Existing Tasks -> Edit Selected Task List -> Save. 
 +    * Click on Tools -> Assigned Tasks -> Currently Available Tasks -> OAI_Dissertations (or OAI_Masters,​ OAI_MFA, OAI_LARP, as appropriate). 
 +  - **Miscellaneous Fixes** //(These fixes are easy to do in MarcEdit, through Find/​Replace. Some are more important than others; some might be corrected through editing the task lists; after reviewing Regex rules, I can tackle this!)// 
 +    * IMPORTANT**Field 690 \\** needs to be changed to **657 \7** with **$2local/​mu** appended at the end (edit task list?) 
 +    * IMPORTANT: Check to be sure **Field 049 \\$aAUMM** is present in the recordsIf not, add it (edit task list?) 
 +    * Replace $zLink to free resource with $zLink to resource. (**NOTE:** This will be only for uploading to OCLC, as some will have 1 or 5-year access restrictions. When the final records are downloaded into Aleph, we need to reinsert the "​free"​ as only certain phrases are acceptable.)  
 +    * Check for double periods (i.e., Doctoral..),​ missing dates in the 264 and 502. Replace Scholarworks with ScholarWorks (edit task list?) Since these records lack 504s, change obm-space in the 008 to om (edit task list?). Add a period to "​advisor"​ (edit task list?). 
 +    * Check to be sure the Summary (field 520), advisors (field 700) and keywords (field 653) are present in all records. ​ If not, download the dissertation or thesis in ScholarWorks and check the abstract and advisor lists entered by the author. Enter the missing information into the metadata screen under **Revise dissertation (or thesis)** and MarcEdit record. (See **How to fix errors in ScholarWorks.**) 
 +  - **Author and title adjustments** 
 +    * With the MarcEdit file open, click Edit -> Edit Shortcuts -> Change Case -> Title Case (for 100$a and 700$a) 
 +    * Click Edit -> Edit Shortcuts -> Change Case -> Initial Case (for 245$a), then -> lower case (for 245$b) 
 +    * __Author fixes:__ Find -> =100 -> Find All. Output to Excel. Examples of problems: “Dr.” (and other titles) should be removed, internal capitalization needs fixing (as in DeStefano, McCormick, LaPlante, O'​Neill,​ etc.), period missing from initial, order of name. Fix in the MarcEdit file. (**NOTE:** Sometimes authors enter shortened versions of their names in the SW metadata description,​ e.g. without middle initials included in the title page of their work.  If correct otherwise, let it go. If the metadata information is incorrect, e.g. misspelled, download the work in SW and check title page to be sure, fix in the MarcEdit file and also in SW with **Revise dissertation/thesis.**) 
 +    * __Advisor fixes__: ​ (**NOTE:** There will be many more advisor entries than author entries. Doing the following is helpful in revealing inconsistencies and other questionable problems, especially for longer Dissertation lists.) Find -=700 -> Find All, output to Excel and Data/Sort the 700 names A-Z. (**NOTE:** I have compiled an Excel sheet of some alphabetized (controlledadvisors'​ names found in the Connexion authority file, **Drive:​\OAI\Authorities.xlsx** which can be useful for updating advisor entries.) 
 +    * __Title fixes__: Find -> =245 -> Find All, and output to Excel. Screen titles for proper names (e.g. for people, countries, cities, scientific names, etc.) and for acronyms, and capitalize as required. //Hint: Information in the 520 field (Summary/​Abstract) can be helpful; otherwise verify in SW.// 
 +    * Make sure the non-filing character indicators are correct. (For example, a title beginning with a quote mark should be labeled 245 11.) 
 +    * LAST STEP: MarcMake the MarcEdit file. This can be done by clicking File/​Compile file into MARC with the .mrk document open in MarcEditor, or by closing the document and clicking the "​Hammer & Wrench"​ MARC Tools icon in the MarcEdit home window. //Hint: If relabeling the file with an extension (i.e. .mrc), be careful when copying with "​rename"​ in Services, to include the extension. ​ I like to replace .mrk with mm, to avoid this problem.// (Example): **umdissertations_septmm**
-  ​* Highlight all records in the file and Validate ​(Edit -> Validate ​or Shift+F5)This will generate ​a report of resultsNote which records did not validate ​and make the necessary correctionsRe-validate as needed.  +===Upload to Connexion=== 
-  * Highlight ​all records in the file and Update ​Holdings ​(Action ​-> Holdings ​-> Update Holdings or F8). OCLC record numbers will begin appearing ​in the file as each record is uploaded+  - **Prepare Local Save File** 
 +    * __Option 1__: Go to File -> Local File Manager -> Create File.  As follows: oai2019_dissertations,​ oai2019_theses,​ oai2019_thesesmfa,​ oai2019_theseslarp. (Connexion will add extension .bib.db) Highlight file just created, and Set as Default. Close. 
 +    * __Option 2__: If working in the same year, go to Cataloging -> Search -> Local Save File. Click the drop-down arrow at right end of __Local File__ window, choose the file for dissertations,​ etc. under the correct year, and hit "​OK"​ to open it up. This will automatically set it as the default file. Highlight all records ​currently ​in the fileand hit Action -> Delete. Screen will become blank. 
 +  - **Import records from MarcEdit** 
 +    * Go to File -> Import Records... Browse for correct mm (or .mrc) file to enter into __File to Import__. __Destination__ = Import to Local Save File. __Bibliographic__ = appropriate Local Save File (set as default). (**NOTE:** Character Set under __Record Characteristics__/​Bibliographic Records needs to be UTF-8 Unicode.) Hit OK; close Report window. 
 +  - **Manage records in Connexion Local Save File** 
 +    * Go to Cataloging -> Search -> Local Save File. Correct Local Save File should appear in top field. Hit OK. 
 +    * Validate save file records: ​ Highlight all records and hit Edit -> Validate. ​When finished, ​a report ​will be generated. Keep track of the record numbers reporting validation problems(//Hint: I copy the entire report to Word or Notepad ​and delete all "​Validation Successful"​ entries; if the remaining list is long, it can be printed for easy reference.//) Locate non-validated records by Save File number, open each one and fix the issue, then validate ​them singly. (**NOTE:** Most validation mistakes will be repetitive Field 653 key word entries, though sometimes something else will pop up, such as Chinese characters, see following step.) 
 +    * To validate a record with Chinese characters, click Edit -> MARC-8 Characters -> Convert to MARC-8 CJK. Then Validate.  
 +  ​**Super- and subscript, and Greek letter fixes:** OPTIONAL, if can be done without too much hassle!!! Not all sups, subs and symbols can be fixed; this is OK. 
 +    * These fixes can be done on the records in the Connexion Local File __after__ they are validated! A batch validation will not accept them, while they can be singly validated after the fixes are done. Most will be in the 520 Field (Summary/​Abstract),​ though occasionally they will appear somewhere else, such as the title. OPTIONAL if can be done without too much hassle--some sups and subs and symbols cannot be fixed, e.g. if they are in the title. 
 +    * Open the MarcEdit .mrk file, click Edit -> Find -> %%<sup ->%% Find All.  Record record numbers where this is found; repeat with <sub, then with the common Greek letters (spelled out): alpha, beta, gamma, lambda, epsilon, mu. These will appear in the harvested records in brackets: [alpha] etc. 
 +    * Open the corresponding records in the Local Save File, and fix %%<​sup>​2</​sup>​%% (etc.) found there. Connexion will supply some sups and subs, found under Edit -> Enter Diacritics. Replace the entire %%<​sup>​digit</​sup>​%% string with the correct character. ​ Word will supply a few more, found by opening Word, clicking Insert on the top bar, then Symbol/More Symbols, under Font: (normal text), by scrolling down to Superscripts and Subscripts under Subset: (Word sups and subs will properly transfer to Connexion, and will display in the Aleph OPAC with the correct Unicode.) Greek letters can be copied from Aleph, by pulling up a record and clicking the "brick wall" in the upper right corner. When the diacritic screen appears, choose Greek. //Hint: Search for "Greek alphabet"​ in Google, and refer to the pictorial representations for identification of the various letters.//​ 
 +    * After fixing sups/subs and Greek letters (and any other symbols easily found in Word, e.g. infinity sign), validate each record. 
 +  - **Update ​holdings/​add OCLC#​’s.** 
 +    * Be sure to log onto Connexion. Highlight records and click the green Update arrow in the top bar, under the word "​Batch."​ Wait for it to stop "​ticking."​ 
 +    * Double-check OCLC# column for blanks ​(missed validation). //Hint: Sort by clicking the heading, Control #, afterward returning the list to its original order by clicking Save #.//  Validate and update any blanks found. 
 +  - **Set Connexion Export parameters.** 
 +    * Go to Tools -> Options ​-> Export. 
 +    * Highlight File (Prompt for filename). Apply and Close.  
 +  - **Export Local Save File** 
 +    * Highlight all files in the Local Save File.  Go to Action -> Export. 
 +    * Designate path and name for Output ​file: (example) U:​\OAI\Dissertations\2019\umdissertations_septoclc. Exports ​as a .dat file. (**NOTE:** The download will pause when non-AMA (term?) symbols are encountered. Note these numbers for fixing in Aleph later.
-**To export ​from Connexion to Aleph:**+===Download into Aleph=== 
 +  - **Preliminary MarcEdit Fixes** 
 +    * MarcBreak the file. Replace .dat with mb. Open in MarcEditor. 
 +    * Change AUMM to AUMETD. Go to File -> Edit -> Replace -> enter AUMM, AUMETD ->​Replace All, Save. 
 +    * Delete 035s (not needed here; records already have 001s). Go to Tools -> Add/Delete Field -> enter 035 into Field (no need supply data): -> Delete Field. Save. 
 +    * File -> Edit -> Replace -> enter $zLink to resource, $zLink to free resource. Replace All, Save. 
 +  - **MarcMake the file** Replace mb with mm.  (**NOTE:** This file should be named differently ​from the mm file loaded into Connexion ​(examples): **umdissertations_septoclcmm** vs. **umdissertations_septmm** ​ Save to appropriate personal folder, and copy to FCL01/​Scratch in WinSCP. 
 +  - **Load records using Aleph Services** ​ //HintMuch time can be saved by Clicking ​**View History** and highlighting and opening the Service Form for the same jobs performed on earlier batches of materials.//​ 
 +    * Go to Services -> Load Catalog Records -> Advanced Generic Vendor Records Loader (File-90). Set Loader rules 
 +        - Input File name (example): umdissertations_septoclcmm 
 +        - Default Holding: AUMETD 
 +        - Character Conversion: OCLC_UTF_TO_UTF 
 +        - Fix Routine: UMFIX 
 +        - Match Routine: OCLC 
 +        - Merge Routine: OCLC 
 +        - Update Database: Yes 
 +        - Produce Loading Report: Yes 
 +        - Report file name: (example) umdissertations_sept2019report 
 +    * Add to History, Submit. 
 +    * Check results. When done (per Batch Log [A] under Task Manager), click [J] File List. The file name will appear under several versions: ​ .failure, .single, .new and .multi. ​ Highlight .new version (which should have the largest size, unless something glitched), make sure “Print Configuration” is set to “View HTML,” and click “Print” (to right of top window) to view the loaded records. ​ Check one or two by bib number in the Aleph GUI to make sure they loaded correctly. An item and HOL should also have been created. 
 +  - ** LAST DETAILS: Globally remove 856 and add 910 fields from/to the bib records ** 
 +     * Go to WinSCP alephe/​scratch to find the files for the newly-loaded records, under .adm, .bib, .hol, .items, and .orders. ​ Use the .bib file, which will contain a Services-ready list of bib numbers. ​ //Hint: I renamed this: (example) umdissertations_sept2019bibnos,​ and copied it to my personal folder for my records.//​ 
 +     * Go to Services -> Catalog Maintenance Procedures -> Global Changes (manage-21):​ 
 +       - Input file name: (i.e., umdissertations_sept2019bibnos) 
 +       - Output file name: (i.e., umdissertations_sept2019bibnos_del856) 
 +       - Line in Record -> Tag: 856; First Indicator: #, Second Indicator: # 
 +       - Delete Field – Yes. 
 +       - Add to History, Submit. 
 +     * Repeat this process to add a the 910:  ABC 04/23/2020 BATCHN (ABC = your initials). This allows these records to be counted in the IRM monthly statistics. 
 +       - Input file name: (example) umdissertations_sept2019bibnos) 
 +       - Output file name: (example) umdissertations_sept2019bibnos_add910) 
 +       - Line in Record -> Tag: 910; leave indicators blank. 
 +       - Delete Field – No. 
 +       - Add to History, Submit.
-  * Go to Tools -> Options and click on the Export tab. Highlight the __Prompt for filename__ option then check off the box for Display report for immediate export results. Click on Apply then Close. +Job done!
-  * Open the Local Save file you want to export (2016_Feb_Dissertations - See path above) +
-  * Highlight records +
-  * Export (Action - Export or F5) +
-    This will ask where to put the output file in your C: drive and what name to use. Make sure the filename is in all __lower case__ - for example, feb2016diss. The file will be downloaded into your C: drive as a .dat file. (Example: C:​\Crosswalk\Dissertation&​Theses\Connexion_Records\feb2016diss.dat) +
-  * Open MARCTools in MarcEdit. +
-  * Input the .dat file from your C: drive (feb2016diss.dat)and name the Output file with a .mb extension (feb2016diss.mb. Execute the MarcBreaker. +
-  * Click on Edit Records. Use Replace to change AUMM to AUMETD. +
-  * Under MARCEditor --> File, click on Compile File into Marc. This will save as a .mrc (MARC) file.  +
-     +
-  * Open Aleph, Cataloging function +
-  * Click on Task Manager then [F] Upload/​Download files +
-  * Find where your saved .mrc file is on your C: drive (feb2016diss.mrc) and copy to the FCL01/​Scratch file (from drop-down menu over left Remote Files column)by clicking on the left arrow button between columns +
-  * In the Aleph menu bar above, click on *_Services -> Load Catalog Records +
-  * Click on Advanced Generic Vendor Records Loader (file_90) +
-    Make sure the following rules are set: +
-     * Input File name (for this example, feb2016diss.mrc) +
-     * Default Holding - AUMETD +
-     * Character Conversion - OCLC_UTF_TO_UTF +
-     * Fix Routine - UMFIX +
-     * Match Routine - OCLC +
-     * Merge Routine - OCLC +
-     * Update Database - Yes +
-     * Produce Loading Report - Yes +
-     * Report file name(for this example, feb2016_report) +
-     * Click on the Submit button at top right+
-Once the exporting is done, click on Task Manager -> [A] Batch Log to view the report. 
-     * Highlight your file (p_file_90) and click on View Printouts. ​ 
-     * Under Remote Name, highlight <​filename>​_report.new (i.e. feb2016diss_report_new) 
-     * Click on Print to obtain reports. ​ You want the loader-log-report which will show the FCL01 Bib Sys numbers for each record. ​ Copy one and check the bib record which displays for any potential corrections needed. ​ 
-**To Globally Remove the 856 Field from Bib Records:​** +-- //​Contact ​person: [[lucyd@library.umass.edu| Lucy deGozzaldi]]// ​     ​ 
- + 
-  * Click on *_Services -> Catalog Maintenance Procedures -> Global Changes (manage-21) +
-   Set the rules: +
-  * Input file name <​filename>​.mrc.bib (i.e., feb2016diss.mrc.bib) +
-  * Output file name <​filename>​.mrc856 (i.e., feb2016.diss.mrc856) +
-  * Update Database - Yes +
-  * Line in Record -> Tag -> 856; first indicator - #  second indicator - # +
-  * Delete field - Yes +
-  * Click on Submit button +
- +
-The process should now be complete.  +
- +
- +
--- //​Contact ​persons[[kdion@library.umass.edu| Kay Dion]] or [[lucyd@library.umass.edu| Lucy deGozzaldi]]// ​     ​+
batch_uploading_oais_to_oclc_and_aleph.1546881608.txt.gz · Last modified: 2019/01/07 12:20 by
[unknown link type]Back to top
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0