Batch Conversion of ETDs to e-records with Scholarworks URLs

Link to ETD Workflow Google doc:

https://docs.google.com/document/d/1Qs_uIWPjDfUIcQhxJiMlgvgN8XpexZn2CTs0eSGTOxo/edit

NOTE: Some older ETDs will be “OPEN” (remain accessible) via the Internet Archive. These will be processed according to general OCA digitization procedures. Most ETDs, however, will the “DARK” on the Internet Archive. Although the pick lists will include IA URLs, these will not be retained in the final records, but serve as a hook for replacing the script with Scholarworks URLs.

Preparation of pick list records for conversion:

  1. Create a “working copy” from the completed pick list of items which have been scanned.
    • Get rid of columns that do not have any information.
    • Look for any entries which did not get scanned. Either these might not be found, or they were rejected, etc. Sort to bottom/new spreadsheet for entry into “ETD Rejects Tracking sheet“ (found on W:\ETD Digitization Project).
    • Sort the “search-id” column (Aleph bibnos) and copy it to a new spreadsheet, edit it into Services-ready form (with 0's in front to make nine figures and FCL01 appended in back) and run it through an Aleph services download for 035 and 502. Compile a list of OCLC numbers through Excel, in order to check for duplicates already processed. NOTE: TOTALS files are found on W:\Open Content Alliance\Pick lists\Completed picklists\lucyd: umoca[date]disstotals (bibnos and oclcnos). When DP records also may possibly exist for the records on the pick list, use oclcnos when looking for duplicates. If no DP records exist, the match can be done on bibnos. Do a VLOOKUP for any matches, and eliminate from list.
    • IMPORTANT: Sort into Masters and PhDs by call# or 502 BEFORE putting into MarcEdit! (Label them mastersbibnos and phdbibnos.)
  2. Run lists of Services-ready bibnos (um_[date]bibnos) through Services to produce a MARC file, in preparation for using MarcEdit on them: Copy to alephe/scratch and download. In Field 1 + indicator, enter #####. In *Format, enter MARC instead of the usual Aleph Sequential. In *Fix Routine, enter LDR9A. Copy to personal folder.

Manipulate file in MarcEdit

  1. Go to MarcEdit/MARC Tools in body of window, and MarcBreak the file, adding mb to name (um_[date]bibnos_outmb). Open the file in MarcEditor. IMPORTANT: After every change to the mb file, SAVE!
  2. Check Reports to make sure everything is OK. Check for (and remove invalid fields or fix): 006, 007, 530, 533, 538, 740, 856, the gmd [electronic resource] in 245 (from the DP), 949, 999. Fix corresponding print records with old Proquest URLs.
  3. Go to Tools/Assigned Tasks.
    • Choose OCAdiss task lists(different from regular OCA task lists) for NON-DARK materials.
    • Choose OCAdissdark for DARK materials. Use RDA version for UM records, nonRDA version for DP records. The task list will remove OCA references for DARK materials. The task list will also add:
      • 500=$aAvailable online through ScholarWorks@UMass Amherst.$5AUM,
      • 530=$aPrint copy also available.
      • 533=$aElectronic reproduction.$bAmherst, Mass.$cUniversity of Massachusetts Amherst,$d[date].$nScanned as part of the Retrospective Dissertation/Thesis Digitization Project by the UMass Amherst Libraries.$nAvailable in PDF.$5AUM.
      • 538=$aMode of access: World Wide Web.$5AUM. the
      • For UM (RDA): 710 2_$aUniversity of Massachusetts Amherst.$bLibraries,$eissuing body. Retain 090, 910 an
      • For UM (RDA): 337, 338 (for electronic records) are edited/added. 347 is added.
  4. Save file and check whether the tasks performed correctly. NOTE: These changes are for preparing the records for upload into Connexion. The Scholarworks versions will be much more brief.
  5. Manual checks/edits:
    • Find/replace RDA 26431 (entered in task list) to 264\1.
    • Find/replace 264\4 information to read: ©date (if task list didn’t add this correctly)!
    • 533: Update task list date for 533 when needed.
    • 008: Download to a spreadsheet and check for bad characters (at the end, byte 38 needs to be \ and 39 needs to be d). Bytes 23-25 should be obm when 504 is present in bib. Search for 504; if more records have it than not, enter obm and manually change to om\ when 504 is not present. Or vice-versa.
    • 300: Fix punctuation in 300 fields; remove “Append.”
    • Search and replace “leaf and leaves” with “page and pages,” in 300’s and 540’s.
    • Search 500 Typescript and delete.

Insert the Internet Archive URLs from the pick list

  1. Add [old] OCLC#’s to the pick list, via download of 035’s from the bibnos file. NOTE RE duplicate Sys#'s: Shouldn’t be any journals here, thus no duplicate Sys#’s in original working copy; so one OCLC# per record; see duplicate check above. Occasionally there will be multiple volumes in the dissertations. CAUTION: ETDs with multiple volumes are quite often scanned as one piece. Check the Internet Archive (search by URL), and fix the record to reflect scan by replacing 2 v. with pages information. If scanned as two vols. with two URLs, use only the URL for the first volume! Keep track of the ETD and fix after ScholarWorks upload. See Batch Uploading ETDs to ScholarWorks, Combining 2-part ETDs in Scholarworks.
  2. IMPORTANT: SORT FIELDS NOW! With the “mb” document open in MarcEditor, click Tools/Sort by …/Sort All Fields. If the URL has already been added, it will get out of order.
  3. Inserting the URL: Merge method (for long lists): NOTE: MarcEdit updates will change details in the following, tweak as needed.
    • Create a um_[date]merge spreadsheet, using the “pipe” trick if necessary, with the headings: 776$w (the [old] OCLC# in the following format: (OCoLC)########), *856$u41 (the IA URL). Save as “tab delimited.” NOTE: Any OCLC number with less than eight digits will need 0's added in front, to add up to 8. Nine digits are fine.
    • Go to MarcEdit\Delimited Text Translator (in body of window). Browse for the correct merge.txt file, and open it to enter into the top window. Copy and replace .txt with .mrk for the output file. Hit Next.
    • In following window, remove checks from Sort Fields and Calculate common nofiling data. Click “Auto Generate.” This will pop the fields into the Arguments window. Hit Finish.
    • Add.mrk to the mb file we've been working on. Go to MarcEdit\Tools (in toolbar)\Merge records. Add to window: Source File=mb file with .mrk appended. Merge file=merge.mrk created above. Copy Source File into “Save File” window. NOTE: MarcEdit creates a handy backup (.bak file) in place something goofs. Record identifier: Type in the 776 over the 001 that appears there. (This is the “hook” which joins the information.) IMPORTANT: Add the subfield. Hit Next.
    • In following window, choose “Merge selected fields.” Hit Next.
    • In following window, either type or push 856 over into the Merge Fields box. (This is the field we want to insert into the mb document.)
  4. “Manual Method” (for a short list):
    • Construct the URLs in an Excel sheet, using the combine columns function, (=A1&”“&B1$”“C1), as follows: =85641 $uhttp:archive.org/details/[url]. NOTE 1: We don't need the subfield z linking note for the IA URLS! NOTE 2: If using the equal sign, precede it with a quote mark and eliminate quote with “Text to Columns” function. Save as um_[date]URL.txt.
    • Line up this um_[date]URL.txt document in the same order as the working document, by adding the column of Aleph bibnos and sorting it to match the working document. Then add URL by alt-tabbing between the URL document and the mb document in MarcEditor. Make sure the OCLC number matches the 776 (Original) field!
  5. Tweak the mb file:
    • Make sure 2 spaces are between all 856s and indicators 41; search by 856space41, and 856spacespacespace41.
    • Check 690s to be sure all have Departmental information.
    • URL subfield z: For OCA and most Scholarworks theses/dissertations, use: Link to free resource. For “Opt-Outs” (campus only), use: UMass: Link to resource.
  6. Create the XML file(s) for conversion to ScholarWorks database:
    • Copy the mb files and rename them to um_[date] scholmasters_outmb, and scolphd_outmb.
    • Run the task list SCHOLAR on them. This will produce much shorter versions of the records.
    • MarcMake the files, replacing mb with mm.
    • Convert to MARC21XML, replacing mm with xml. NOTE: Upload short versions to ScholarWorks here!

Upload file of longer-version records into Connexion

  1. Complete this step after the upload to ScholarWorks has been completed: Batch Uploading ETDs to ScholarWorks.
  2. Use the spreadsheet into which the SW URLs have been aligned with the proper 776s, described in Generate spreadsheet including SW URLs for e-conversions, in Batch Uploading ETDs to ScholarWorks.
  3. Delete the Internet Archive URLs from the long-version mb file.
  4. Insert the ScholarWorks URLs into the mb file, using the Merge records function described above.
  5. Upload into Connexion Local Save File (set as default). Validate and Update the records.
  6. Go to Tools/Options/Export, Highlight File (prompt for file name), click Apply, and Close.
  7. Search for the default Local Save File, highlight all, and hit Action/Export. Connexion will ask were to put the output file (to personal folder) and what to name it. It will export as a .dat file, in MARC format. NOTE: Sometimes the .dat extension will interfere with subsequent processing; if this happens, delete the .dat or remove the period

Download completed file into Aleph

  1. MarcBreak .dat file through MarcEdit, and append mb. Before we do anything else to this file, we need to delete the 035s, because these files also have the OCLC# in an 001 field. Go to MarcEditor/Tools, Add/Delete Field. When the Add/Delete Field Utility window appears, type in 035, and click “Delete Field.” Save the file.
  2. MarcMake the file. When renaming, get rid of the .dat, and call the resulting file: um_[date]oclcmm. Copy to FCL01\scratch.
  3. Work the Services loading procedure on the file: Load Catalog Records/Convert MARC Records Step 1 (file-01); Convert MARC Records Step 2 (file-02); Fix Catalog Records (manage-37) with *Input File type=ALEPH Sequential, *Fix Route=UMFIX, *Update Dtabase=NO; Check Input File Against Database (manage-36) with Match Section=UM35. NOTE: Since these are new records, there won't be any overlays, but Mike A. advised me that the best habit is to go through these procedures while loading. The 3 output files from Manage-36 should show the number of records and two 0's.
  4. Download original file (um_[date]oclcmm) using “the OCLC loader” (Load Catalog Records/Load OCLC Records (file-93)).
    • *Fix Routine=none; Match Section=OCLC; *Merge Routine=OCLC; Produce Loading Report=YES. This loading report will be used for the final tasks.
    • In Aleph, Go to Task Manager/File List. In bottom window, under Remote Name, highlight the name of the task just run, and after making sure Preview shows at the bottom under “Print Configuration,” click Print at top right. (It won't actually print if Preview is selected.) Copy the resulting table (using Control-A and Control-C) to an Excel spreadsheet, and create a Services-ready list of the newly created bibnos.
  5. Create Holdings and Items. This function uses the 949 entered into the records.
    • Go to Services/Load Catalog Records/Create Holdings and Item Records Using Bibliographic Data (Manage-50). ADM Library=UMA50; HOL Library=FCL60; Main Field=949#1; *Item/Holdings Creation table=tab_hol_item_create_umoca. NOTE: These creation tables are found in WinSCP: U19_1/FCL01/tab/import/ For Holdings/Item Creation Mode, Use “Replace existing holdings and items records”; for Holdings/Item Creation Interation, “Create holdings records based only on new items”; *ADM Cataloger=MANAGE-50; ADM Cataloging Level=20; HOL Cataloger=MANAGE-50; HOL Cataloging Level=20. Select Yes for *Update Database and *Check Item Records.
  6. Delete 856s from the bibs with Global Changes (manage-21).

Add 530s into Print Records

  1. IMPORTANT: Because many backlog ETDs have duplicate copies in the Depository, construct a Services-ready list of bib. numbers obtained through a Ret-06 (Direct Index) search on the OCLC numbers, with OCL in the Search Index field. Using this list will enter the following notes into both UM and DP copies.
  2. For OPEN (non-DARK) theses/dissertations with both IA and Scholarworks URLs in the final versions, add: Also available online through Scholarworks@UMass Amherst and the Internet Archive.
  3. For DARK theses/dissertations with Scholarworks URLs, add: Also available online through Scholarworks@UMass Amherst.

Update ETD Project Tracking-revised spreadsheet

  1. Found in W:\ETD Digitization Project folder
  2. Best to do this in stages, during processing.

Primary contact: Lucy deGozzaldi.

batch_conversion_of_etds_to_e-records_with_scholarworks_urls.txt · Last modified: 2019/01/07 12:20 (external edit)
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0