PAGE OUTDATED ARCHIVED Batch Uploading ETDs to ScholarWorks

Transform MARC.XML file of short-version bibs.+IA URLs to Bepress XML:

Preparation:

Masters theses should be in a separate file from doctoral dissertations.

Remove diacritics!

Short versions used for metadata export should include the following fields:

=LDR  01608nam a2200361K  4500
=100  1\$aSmith, Philip H.
=245  10$aTypes of corn suited to Massachusetts conditions
=260  \\$c1911.
=502  \\$aThesis (M.S.)--Massachusetts Agricultural College, 1911.
=650  \0$aCorn$zMassachusetts.
=657 _7 \\$aTheses$x[Program]$xMasters$2local/mu
=776  $w(OCoLC)18237440
=856  41 %%$uhttp://archive.org/details/typesofcornsuite00smit%%

Creating the Bepress spreadsheet:

  1. Save the .xml file as “marc.xml” in the metadata 2 folder on my U: drive. NOTE: Only the etd2bepress.exe program and the current file (saved as marc.xml) should be in this folder. When finished, delete file along with created bpimport spreadsheet.
  2. Go to W:\ETD Digitization Project/ETD/command line instructions for running Aaron’s program.txt. Move to 2nd computer screen for cut and pasting.
  3. Go to Windows start button and type cmd into Search programs and files box. Hit Enter.
  4. On command line, type; u:
  5. Cut and paste: cd “metadata 2” from Aaron’s instructions. (NOTE: Use right-click instead of shortcut keys.)
  6. Cut and paste: etd2bepress marc.xml from Aaron’s instructions.

This should enter a spreadsheet into the metadata 2 folder: bpimport_date. RENAME with the date and Masters/PhD, and save to personal folder.

Aaron's script is written in Python (Compiled in windows binary so you can't edit that file. Need to go back to original source, edit, and recompile. Use active python - packaged open source.)

Make adjustments to Bepress spreadsheet

  1. Fix spaces in URLs before .pdf and before / (Watch out for titles ending in /)
  2. Change double colons in the titles to single colons. :: = :
  3. Find all opt-outs pertaining to the list(s) being worked on, and enter “campus” in document-type column.
  4. Masters: All other entries will have “open” in this column
  5. PhD’s: All other entries will have “dissertation” in this column.
  6. Masters: Correct the Department column heading to degree_prog
  7. Keyword fix: Go to scholmasters or scolphd file in MarcEdit and download 600's, 610's, 650’s and 651’s with all subheadings to an Excel spreadsheet. Combine lines per record, remove periods, $x, $y, $z. Remove comma between Subject headings and Subdivisions, retain commas between separate entries only. Remove parentheses. Run Compare line differences using the 776 value to be sure MarcEdit/bpimport matches, then paste new Keyword entries into the bpimport spreadsheet.
  8. NOTE: The program name in the cataloging record must be that which is provided by the Graduate Office on the spreadsheet. Do not use any variant form or department listed on the title page. A copy of the Degree Programs and their codes can be found in Graduate Degree Program Codes.
  9. NOTE: The final file needs to be saved (and uploaded) as “Excel 97-2003 Worksheet”!!!!

Upload Bepress spreadsheet to the appropriate series in Scholarworks:

Bepress Doctoral Dissertations 1911-2013 series

Bepress Masters Theses 1896 - February 2014 series

  1. Search Scholarworks site and click it open. Click My Account (in upper right corner). Log in with email address and password.
  2. Click Manage Theses or Manage Dissertations, “Batch upload Excel” (on left-side column).
  3. Click Browse and find the correct file. Open it. Hit Upload. Hit Update.

Wait for confirming email

  1. First message: Masters Theses (series)/Doctoral Dissertations (series) queued update complete.
  2. Second message: Import results for Masters Theses/Doctoral Dissertations. This will list successfully imported records, or report errors. This is the important one! If the upload succeeded, click the link sent in the email to access Scholarworks. Click “Update Site” (on left margin of screen). If there are errors, the problems should be displayed; look down to the bottom to find single or a few records involved. Unicode error=unaccepted diacritics/“funky” characters.
  3. Third message: Masters Theses/Doctoral Dissertations queued update complete.
  4. Return to Scholarworks site and search/check a couple of the items to see if the PDF looks OK. Can also paste the URL into my browser.

Combining 2-part ETDs in Scholarworks

  1. Make note of any ETDs in the bpimport spreadsheets, which have 2 volumes represented by 2 Internet URLs. Do the following steps after upload to SW.
  2. (TIP: Since scanned items in Scholarworks are difficult to navigate from front to back, it's best to use the Internet Archive to determine how the 300 field should be constructed in the e-version of a 2-part ETD, before combining the two URLs.)
  3. Copy the two I.A. URLs to be combined (one at a time) and enter them into the command line of browser, to call up the scanned item from Internet Archive. When the item appears, go to window in the bottom right, labeled DOWNLOAD OPTIONS.
  4. Scroll to PDF and click it. This will open the item in the browser. Go to little white “down arrow” at left top of dark field, and click.
  5. At resulting window, choose: Open with Adobe Acrobat (default). Click OK.
  6. In next window, Go to File/Save As… Choose the place to save the file (my EDT folder) and save file as a .pdf. Repeat with the second file. (Easy way to bring up 2nd file: replace the 01’s with 02’s in the command line URL.)
  7. Open Adobe Acrobat (in Programs). Click File (in top task bar)/Create/Combine files into a single PDF.
  8. Click Add Files/Add Files. Select files and Open. NOTE: Be sure to put the first part of the ETD on the left side. Also check to make sure that 01 is actually Part 1 of the Dissertation; in one of my examples, the parts were reversed.
  9. Click Combine Files. A green bar will appear. Go to File/Save As … Call it a new name as a .pdf.
  10. Edit ETD in ScholarWorks:
    • Log into “my account” in SW. Go to Manage Dissertations.
    • Call up dissertation in SW.
    • Click Edit Dissertation in top taskbar. Click Revise Dissertation at left side.
    • Scroll down to “Please upload the full text of your submission.”
    • Click Browse, go to my ETD folder and select the Combined file. Hit Open.
    • Hit Submit.
    • After it finishes Processing, hit Update Site.
    • Once confirming email is received, make sure it worked.

Generate spreadsheet including SW URLs for e-conversions

  1. From the Scholarworks My Account page, click Manage Theses/Dissertations under the appropriate series.
  2. Click Batch revise Excel (on left margin), then click “Generate” in the box under “To revise content via an Excel spreadsheet.“ This will take a while, but when done, will add the downloaded file under today’s date, in the list at the bottom. Click Download and it will automatically open in Excel.
  3. Save this generated spreadsheet in personal folder under a new name (include date, etc.), and delete all columns except for title, author, OCLC# and SW URL.
  4. NOTE: Scholarworks URLs are generated in order; check the bottom of the generated spreadsheet to find the matching entries to the bpimport spreadsheet-in-process. Copy the column of OCLC#'s from the bpimport spreadsheet into the generated spreadsheet and use compare row differences to verify accuracy. Delete all non-matching rows. This produces the list needed to proceed to: Batch Conversion of ETDs to e-records with Scholarworks URLs, Upload file of longer-version records into Connexion.

Notify Meghan, Erin Jerome and Sue Mellin

  1. Create an “upload report” from the bpimport spreadsheet with Title, URL, Authors, M/P, and Program included.
  2. Send an email notification with upload report spreadsheet attached.

Troubleshooting notes:

Check the upload date from an Administrator Report on the series: http://scholarworks.umass.edu/cgi/editor.cgi?window=report&context=dissertations_1

The error message will includes a list of titles. The problem record is usually the one after the last title listed in the error.

Unicode errors: These can result from problem diacritics or symbols in the original records, carried over into MARCXML. Check the MarcEdit “mb” file for character(s) associated with the titles in the error message and remove/replace them, reconvert to MARC and MARCXML and re-run the bpress conversion.

Internet Archive errors: These can result from problems with the Internet Archive links, for example if permissions are lacking, or some entries are duplicates to previously-scanned ETD's. Check the TOTALS folder under ETD Digitization Project on the W: Drive for duplicates; check the Internet Archive linkage; if necessary, contact Tim Bigelow of Internet Archive to resolve the error.

Adding “OPT-OUTS” TO Scholarworks Metadata Spreadsheets

NOTE: The Opt-in/Opt-out information will be found in the ETD Master File, saved on the W: Drive under ETD Digitization Project, in Sheet 2, “all titles,” Column R, “Authors Permissions Response: Opt-in, Opt-out, no response.”

NOTE 2: Because the number of Opt-outs is small compared to the total number of ETD's, don't use the V-LOOKUP to match to the bpimport file in process.

NOTE 3: Opt-outs are updated by Department, in response to letters received from the authors. We should be informed when these letters are received. If the new Opt-outs are not already sorted out for us, proceed as follows:

  1. Make a new copy of the Master File Sheet 2 (all titles) of the ETD Master File, retaining only columns R (Opt-ins/Opt-outs), A or J (Author), B (Department), C (Year), D (Masters/PhDs) and title (F). Make sure all data is connected, with no blank columns between, and sort on column R.
  2. Retain only the rows with Opt-out in Column R.
  3. After sorting on R, delete all rows which do not contain “Opt-out.” Transfer any new Opt-out information to ETD Opt-out Tracking sheet.xlsx, also present in the W: Drive in the same location as the Master File. (Hint: Sort by Department to isolate the new Opt-outs, as reported to us. Run a duplicate check on the ETD Out-out Tracking sheet, to be sure we don't already have any of this added information.)
  4. When the ETD's on the Opt-out list are in process, enter this information on the ETD Opt-out Tracking sheet, together with date.
  5. When the ETD's have been uploaded to ScholarWorks and uploaded and downloaded to Connexion and Aleph, enter DONE on the ETD Opt-out Tracking sheet, together with date.

Primary contacts: Lucy deGozzaldi, Meghan Bergin.

batch_uploading_etds_to_scholarworks.txt · Last modified: 2022/05/16 18:53 by jeustis
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0