PAGE OUTDATED ARCHIVED Batch Uploading OAIs from Scholarworks into OCLC and Aleph

CHANGE TITLE? ETDs (Current) Processing ScholarWorks OAIs

NOTE: My helpful “hints” will appear in Italics.

Introduction

The Graduate School will email “Packing Lists” dated February, May and September (end of semesters) of new dissertations, theses, MFA theses and occasionally LARP theses. There may be a lag between these dates and when the ETDs are available on ScholarWorks. I try to process them after a couple of months have passed, to assure that they will be picked up in the Crosswalk harvest.

Preparation

  • Have a handy copy (either online or a printout) of the Packing List-in-process. NOTE: It's a good idea to save copies of these in appropriate folders. Example: PackingListReport_Feb2019diss.xlsx in [Drive]:\OAI\Dissertations\2019\ (i.e. 2019), or OAI\Theses, ThesesMFA or ThesesLARP.
  • Open MarcEdit. (NOTE: Make sure your MarcEdit XSLT engine is set to SAXON.NET. On MarcEdit home page, click tools(found on top), Preferences, MARCEngine, select SAXON/NET under XSLT Engine.)

Harvesting from ScholarWorks

  1. Click on Harvest OAI Records: (Found on either the MarcEdit home page or under (top) tools/OAI Harvester Tools/) Set the following:
    • Set name (for dissertations): publication:dissertations_harvesting (IMPORTANT NOTE: Because of software changes made in 2018, Erin Jerome needs to be informed before running a Crosswalk on dissertations only! Before they can be pulled, they need to be transferred from “publication:dissertations_2” to a special harvesting subset.)
    • Set name (for theses): publication:masters_theses_2
    • Set name (for MFAs): publication:englmfa_theses (NOTE: This series is only for English MFAs; MFAs for art etc. are included in masters_theses_2.)
    • Set name (for LARPs): publication:
    • Metadata type: dcq (NOTE: This is not included in the MarcEdit drop-down, but needs to be typed in. It's a “modified” version of Dublin Core.)
    • Crosswalk path: C:\Crosswalk\XML1\OAIDCtoMARCXMLmodified.xsl (NOTE: This program needs to be loaded onto your personal C: drive.)
    • Start date (for May, in this format): 2019-06-01
    • End date (for May, in this format): 2019-08-31 (NOTE: Using August avoids Sept. lists. Occasionally these dates have to be tweaked to include everything on the appropriate Packing List.)
    • Hit “OK” and let it run. A green bar will appear if it is working. (NOTE: This function is a little cranky. Recently it didn't work for me because I entered 2019-11-31 instead of 2019-11-30. Everything has to be entered precisely! If no amount of tweaking resolves the issue, contact bepress (Digital Commons), which occasionally blocks ScholarWorks harvesting for security purposes, Erin Jerome or Aaron Rubinstein.)
    • Once the harvesting is finished, a MarcEdit list will open up, containing the harvested records in raw form. Hint: Save immediately into the appropriate OAI folder, as (example) umdissertations_sept.mrk (NOTE: When working in MarcEdit, click File/Save after every change!! Do NOT Save if no changes are made.)
  2. Check harvested records against Grad School's packing list In MarcEdit, click Edit/Find/enter =100 in “Find what” window/click Find All. This will produce a list that can be saved to the clipboard, and output to Excel. Hint: To make Excel data more manageable, insert a blank column in front of the Jump to Record column, and work a Data/Text to Columns on the name column, splitting off and deleting the equal sign.
    • IMPORTANT NEW STEP, added 2020:Go to ScholarWorks/Dissertations and Theses and log onto “My account”, scroll down to the appropriate series (i.e., DOCTORAL DISSERTATIONS (dissertations_2)/Manage Dissertations/Batch revise Excel/Generate a spreadsheet of current data. See Changing one year campus titles to open access in ScholarWorks for instructions on generating ScholarWorks spreadsheets. If extra names appear in the MarcEdit file, check the generated spreadsheet to make sure they are NOT dated in the range requested. (This step has been added since occasionally a dissertation or thesis will have been left off the Packing List.)
    • Any harvested record NOT on the Packing List that is also not on the generated spreadsheet, or has a different date (Check degree_year and award_month), or which belongs to a different series (such as English MFAs)can be removed from the MarcEdit file.

Edit the MarcEdit file of harvested records

  1. Run MarcEdit task
    • Change date in 008 with the new year, under Tools → Manage Tasks → Selected desired task in Task Lists window → Manage Existing Tasks → Edit Selected Task List → Save.
    • Click on Tools → Assigned Tasks → Currently Available Tasks → OAI_Dissertations (or OAI_Masters, OAI_MFA, OAI_LARP, as appropriate).
  2. Miscellaneous Fixes (These fixes are easy to do in MarcEdit, through Find/Replace. Some are more important than others; some might be corrected through editing the task lists; after reviewing Regex rules, I can tackle this!)
    • IMPORTANT: Field 690 \\ needs to be changed to 657 \7 with $2local/mu appended at the end (edit task list?)
    • IMPORTANT: Check to be sure Field 049 \\$aAUMM is present in the records. If not, add it (edit task list?)
    • Replace $zLink to free resource with $zLink to resource. (NOTE: This will be only for uploading to OCLC, as some will have 1 or 5-year access restrictions. When the final records are downloaded into Aleph, we need to reinsert the “free” as only certain phrases are acceptable.)
    • Check for double periods (i.e., Doctoral..), missing dates in the 264 and 502. Replace Scholarworks with ScholarWorks (edit task list?) Since these records lack 504s, change obm-space in the 008 to om (edit task list?). Add a period to “advisor” (edit task list?).
    • Check to be sure the Summary (field 520), advisors (field 700) and keywords (field 653) are present in all records. If not, download the dissertation or thesis in ScholarWorks and check the abstract and advisor lists entered by the author. Enter the missing information into the metadata screen under Revise dissertation (or thesis) and MarcEdit record. (See How to fix errors in ScholarWorks.)
  3. Author and title adjustments
    • With the MarcEdit file open, click Edit → Edit Shortcuts → Change Case → Title Case (for 100$a and 700$a)
    • Click Edit → Edit Shortcuts → Change Case → Initial Case (for 245$a), then → lower case (for 245$b)
    • Author fixes: Find → =100 → Find All. Output to Excel. Examples of problems: “Dr.” (and other titles) should be removed, internal capitalization needs fixing (as in DeStefano, McCormick, LaPlante, O'Neill, etc.), period missing from initial, order of name. Fix in the MarcEdit file. (NOTE: Sometimes authors enter shortened versions of their names in the SW metadata description, e.g. without middle initials included in the title page of their work. If correct otherwise, let it go. If the metadata information is incorrect, e.g. misspelled, download the work in SW and check title page to be sure, fix in the MarcEdit file and also in SW with Revise dissertation/thesis.)
    • Advisor fixes: (NOTE: There will be many more advisor entries than author entries. Doing the following is helpful in revealing inconsistencies and other questionable problems, especially for longer Dissertation lists.) Find → =700 → Find All, output to Excel and Data/Sort the 700 names A-Z. (NOTE: I have compiled an Excel sheet of some alphabetized (controlled) advisors' names found in the Connexion authority file, Drive:\OAI\Authorities.xlsx which can be useful for updating advisor entries.)
    • Title fixes: Find → =245 → Find All, and output to Excel. Screen titles for proper names (e.g. for people, countries, cities, scientific names, etc.) and for acronyms, and capitalize as required. Hint: Information in the 520 field (Summary/Abstract) can be helpful; otherwise verify in SW.
    • Make sure the non-filing character indicators are correct. (For example, a title beginning with a quote mark should be labeled 245 11.)
    • LAST STEP: MarcMake the MarcEdit file. This can be done by clicking File/Compile file into MARC with the .mrk document open in MarcEditor, or by closing the document and clicking the “Hammer & Wrench” MARC Tools icon in the MarcEdit home window. Hint: If relabeling the file with an extension (i.e. .mrc), be careful when copying with “rename” in Services, to include the extension. I like to replace .mrk with mm, to avoid this problem. (Example): umdissertations_septmm

Upload to Connexion

  1. Prepare Local Save File
    • Option 1: Go to File → Local File Manager → Create File. As follows: oai2019_dissertations, oai2019_theses, oai2019_thesesmfa, oai2019_theseslarp. (Connexion will add extension .bib.db) Highlight file just created, and Set as Default. Close.
    • Option 2: If working in the same year, go to Cataloging → Search → Local Save File. Click the drop-down arrow at right end of Local File window, choose the file for dissertations, etc. under the correct year, and hit “OK” to open it up. This will automatically set it as the default file. Highlight all records currently in the file, and hit Action → Delete. Screen will become blank.
  2. Import records from MarcEdit
    • Go to File → Import Records… Browse for correct mm (or .mrc) file to enter into File to Import. Destination = Import to Local Save File. Bibliographic = appropriate Local Save File (set as default). (NOTE: Character Set under Record Characteristics/Bibliographic Records needs to be UTF-8 Unicode.) Hit OK; close Report window.
  3. Manage records in Connexion Local Save File
    • Go to Cataloging → Search → Local Save File. Correct Local Save File should appear in top field. Hit OK.
    • Validate save file records: Highlight all records and hit Edit → Validate. When finished, a report will be generated. Keep track of the record numbers reporting validation problems. (Hint: I copy the entire report to Word or Notepad and delete all “Validation Successful” entries; if the remaining list is long, it can be printed for easy reference.) Locate non-validated records by Save File number, open each one and fix the issue, then validate them singly. (NOTE: Most validation mistakes will be repetitive Field 653 key word entries, though sometimes something else will pop up, such as Chinese characters, see following step.)
    • To validate a record with Chinese characters, click Edit → MARC-8 Characters → Convert to MARC-8 CJK. Then Validate.
  4. Super- and subscript, and Greek letter fixes: OPTIONAL, if can be done without too much hassle!!! Not all sups, subs and symbols can be fixed; this is OK.
    • These fixes can be done on the records in the Connexion Local File after they are validated! A batch validation will not accept them, while they can be singly validated after the fixes are done. Most will be in the 520 Field (Summary/Abstract), though occasionally they will appear somewhere else, such as the title. OPTIONAL if can be done without too much hassle–some sups and subs and symbols cannot be fixed, e.g. if they are in the title.
    • Open the MarcEdit .mrk file, click Edit → Find → <sup -> Find All. Record record numbers where this is found; repeat with <sub, then with the common Greek letters (spelled out): alpha, beta, gamma, lambda, epsilon, mu. These will appear in the harvested records in brackets: [alpha] etc.
    • Open the corresponding records in the Local Save File, and fix <sup>2</sup> (etc.) found there. Connexion will supply some sups and subs, found under Edit → Enter Diacritics. Replace the entire <sup>digit</sup> string with the correct character. Word will supply a few more, found by opening Word, clicking Insert on the top bar, then Symbol/More Symbols, under Font: (normal text), by scrolling down to Superscripts and Subscripts under Subset: (Word sups and subs will properly transfer to Connexion, and will display in the Aleph OPAC with the correct Unicode.) Greek letters can be copied from Aleph, by pulling up a record and clicking the “brick wall” in the upper right corner. When the diacritic screen appears, choose Greek. Hint: Search for “Greek alphabet” in Google, and refer to the pictorial representations for identification of the various letters.
    • After fixing sups/subs and Greek letters (and any other symbols easily found in Word, e.g. infinity sign), validate each record.
  5. Update holdings/add OCLC#’s.
    • Be sure to log onto Connexion. Highlight records and click the green Update arrow in the top bar, under the word “Batch.” Wait for it to stop “ticking.”
    • Double-check OCLC# column for blanks (missed validation). Hint: Sort by clicking the heading, Control #, afterward returning the list to its original order by clicking Save #. Validate and update any blanks found.
  6. Set Connexion Export parameters.
    • Go to Tools → Options → Export.
    • Highlight File (Prompt for filename). Apply and Close.
  7. Export Local Save File
    • Highlight all files in the Local Save File. Go to Action → Export.
    • Designate path and name for Output file: (example) U:\OAI\Dissertations\2019\umdissertations_septoclc. Exports as a .dat file. (NOTE: The download will pause when non-AMA (term?) symbols are encountered. Note these numbers for fixing in Aleph later.

Download into Aleph

  1. Preliminary MarcEdit Fixes
    • MarcBreak the file. Replace .dat with mb. Open in MarcEditor.
    • Change AUMM to AUMETD. Go to File → Edit → Replace → enter AUMM, AUMETD →Replace All, Save.
    • Delete 035s (not needed here; records already have 001s). Go to Tools → Add/Delete Field → enter 035 into Field (no need supply data): → Delete Field. Save.
    • File → Edit → Replace → enter $zLink to resource, $zLink to free resource. Replace All, Save.
  2. MarcMake the file Replace mb with mm. (NOTE: This file should be named differently from the mm file loaded into Connexion (examples): umdissertations_septoclcmm vs. umdissertations_septmm Save to appropriate personal folder, and copy to FCL01/Scratch in WinSCP.
  3. Load records using Aleph Services Hint: Much time can be saved by Clicking View History and highlighting and opening the Service Form for the same jobs performed on earlier batches of materials.
    • Go to Services → Load Catalog Records → Advanced Generic Vendor Records Loader (File-90). Set Loader rules
      1. Input File name (example): umdissertations_septoclcmm
      2. Default Holding: AUMETD
      3. Character Conversion: OCLC_UTF_TO_UTF
      4. Fix Routine: UMFIX
      5. Match Routine: OCLC
      6. Merge Routine: OCLC
      7. Update Database: Yes
      8. Produce Loading Report: Yes
      9. Report file name: (example) umdissertations_sept2019report
    • Add to History, Submit.
    • Check results. When done (per Batch Log [A] under Task Manager), click [J] File List. The file name will appear under several versions: .failure, .single, .new and .multi. Highlight .new version (which should have the largest size, unless something glitched), make sure “Print Configuration” is set to “View HTML,” and click “Print” (to right of top window) to view the loaded records. Check one or two by bib number in the Aleph GUI to make sure they loaded correctly. An item and HOL should also have been created.
  4. LAST DETAILS: Globally remove 856 and add 910 fields from/to the bib records
    • Go to WinSCP alephe/scratch to find the files for the newly-loaded records, under .adm, .bib, .hol, .items, and .orders. Use the .bib file, which will contain a Services-ready list of bib numbers. Hint: I renamed this: (example) umdissertations_sept2019bibnos, and copied it to my personal folder for my records.
    • Go to Services → Catalog Maintenance Procedures → Global Changes (manage-21):
      1. Input file name: (i.e., umdissertations_sept2019bibnos)
      2. Output file name: (i.e., umdissertations_sept2019bibnos_del856)
      3. Line in Record → Tag: 856; First Indicator: #, Second Indicator: #
      4. Delete Field – Yes.
      5. Add to History, Submit.
    • Repeat this process to add a the 910: ABC 04/23/2020 BATCHN (ABC = your initials). This allows these records to be counted in the IRM monthly statistics.
      1. Input file name: (example) umdissertations_sept2019bibnos)
      2. Output file name: (example) umdissertations_sept2019bibnos_add910)
      3. Line in Record → Tag: 910; leave indicators blank.
      4. Delete Field – No.
      5. Add to History, Submit.

Job done!

Contact person: Lucy deGozzaldi

batch_uploading_oais_to_oclc_and_aleph.txt · Last modified: 2022/05/16 18:53 by jeustis
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0