Submitting Holdings to HathiTrust

As a condition of our membership in the HathiTrust (HT) Partnership we are required to submit records for our print holdings annually. HT requires the holdings be separated into 3 files: Single Part Monographs, Multi-Part Monographs, and Serials. Information on submitting holdings data to HT is available at: https://www.hathitrust.org/print_holdings.

Rick Leveille has the specifications for generating the files that will be sent to HT.

Uploading the files

  1. Click on “Not a part of University of Michigan?”
  2. Click on “Sign In with SSO”
  3. Enter your password and click “SIGN IN”
  4. You will get the Shibboleth login screen.
  5. Create a new folder “[current year] holdings” inside the “UMass” folder.
  6. Upload files into the folder that you just created.

Submitting OCA digitized content

The Libraries contributes content digitized by the Open Content Alliance (Internet Archive) to the HathiTrust. HathiTrust (HT) already works with the Internet Archive (IA), which facilitates the process. We only need to provide HT with MARC records that include content specific to IA, the IA Identifier and the ARK Identifier, in the 955 tag. More information on submitting bibliographic records is available here: https://www.hathitrust.org/bib_specifications

On the fcweb.library.umass.edu server, there is a PHP script (www/html/ht/get_IA_data2.php) which takes a file of comma separated Aleph bib numbers and IA Identifiers and creates the appropriate MARCXML for submission to HathiTrust. It FTPs the file to HT and sends the email notification that is required by HT. It works fairly well, but occasionally encounters a problem that needs to be addressed.

The file of comma separated values can be created using the Pick List that is returned from OCA. Talk to Lisa Persons about where the latest files are, typically they are located in W:\Open Content Alliance\Pick lists\Completed picklists. The first 2 columns on the Pick List should contain the bib number and the IA identifier. If there are errors, a column will be inserted between the first and second column where the error is noted. Errors should be deleted from the Pick List and the empty column should be deleted. Delete all other columns, and the header, of the Pick List and save the file as “ialist_YYYYMMDD_[local identifying information].csv” (no spaces). Follow the below steps to complete the processing:

  1. Upload the file created from the Pick List to the www/html/ht/ directory on the fcweb.library.umass.edu server [Talk to Steve Bischof if you need FTP access to the FCWeb Server.]
  2. Edit the get_IA_data2.php file.
  3. Change the $fileId variable to reflect the date and local identifying information, e.g., “20160408_darktruck1”
  4. Update the $my_email variable as necessary. Separate multiple emails with a comma.
  5. From the command line run: php get_IA_data2.php
  6. because sendmail is not installed on the fcweb server, the php script will not successfully send; create manual msg to cdl-zphr-l@ucop.edu with current file name, size in bytes, and nbr records (use previously sent msg as template)
  7. The email address(es) entered in $my_email should receive an email after the file has been uploaded to HT.

For digitized content other than digitized theses and dissertation, you can also create the file of comma separated values by printing the 856 from Aleph in Aleph sequential format for the records that you want to upload to HT. This will provide you with the bib number and IA identifier, but you will need to massage the data to get it into the proper format. This has been useful for Massachusetts Documents records, because many of the print records never had an OCLC number.

hathitrust.txt · Last modified: 2019/01/07 12:20 (external edit)
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0