This shows you the differences between two versions of the page.

Link to this comparison view

batch_conversion_of_etds_to_e-records_with_scholarworks_urls [2016/07/22 17:17]
ldegozzaldi [Download completed file into Aleph]
batch_conversion_of_etds_to_e-records_with_scholarworks_urls [2019/01/07 12:20]
Line 1: Line 1:
-====== Batch Conversion of ETDs to e-records with Scholarworks URLs ====== 
-//NOTE: Some older ETDs will be "​OPEN"​ (remain accessible) via the Internet Archive. ​ These will be processed according to general OCA digitization procedures. ​ Most ETDs, however, will the "​DARK"​ on the Internet Archive. ​ Although the pick lists will include IA URLs, these will not be retained in the final records, but serve as a hook for replacing the script with Scholarworks URLs.//  ​ 
-===== Preparation of pick list records for conversion: ===== 
-  - Create a “working copy” from the completed pick list of items which have been scanned. 
-    * Get rid of columns that do not have any information.  ​ 
-    * Look for any entries which did **not** get scanned. ​ Either these might not be found, or they were rejected, etc.  Sort to bottom/new spreadsheet for entry into “ETD Rejects Tracking sheet" (found on W:\ETD Digitization Project). 
-    * Sort the "​search-id"​ column (Aleph bibnos) and copy it to a new spreadsheet,​ edit it into Services-ready form (with 0's in front to make nine figures and FCL01 appended in back) and run it through an Aleph services download for 035 and 502.  Compile a list of OCLC numbers through Excel, in order to check for duplicates already processed. ​ //​NOTE: ​ TOTALS files are found on W:\Open Content Alliance\Pick lists\Completed picklists\lucyd: ​ umoca[date]disstotals (bibnos and oclcnos).// ​ When DP records also may possibly exist for the records on the pick list, use oclcnos when looking for duplicates. ​ If no DP records exist, the match can be done on bibnos. ​ Do a VLOOKUP for any matches, and eliminate from list. 
-    * **IMPORTANT:​** ​ Sort into Masters and PhDs by call# or 502 BEFORE putting into MarcEdit! ​ (Label them mastersbibnos and phdbibnos.) 
-  - Run lists of Services-ready bibnos (um_[date]bibnos) through Services to produce a MARC file, in preparation for using MarcEdit on them:  Copy to alephe/​scratch and download. ​ In Field 1 + indicator, enter #####​. ​ In *Format, enter MARC instead of the usual Aleph Sequential. ​ In *Fix Routine, enter LDR9A. ​ Copy to personal folder. 
-===== Manipulate file in MarcEdit ===== 
-  - Go to MarcEdit/​MARC Tools in body of window, and MarcBreak the file, adding mb to name (um_[date]bibnos_outmb). Open the file in MarcEditor. **IMPORTANT: ​ After every change to the mb file, SAVE!** ​ 
-  - Check Reports to make sure everything is OK.  Check for (and remove invalid fields or fix):  006, 007, 530, 533, 538, 740, 856, the gmd [electronic resource] in 245 (from the DP), 949, 999.  Fix corresponding print records with old Proquest URLs.  ​ 
-  - Go to Tools/​Assigned Tasks. 
-    * Choose OCAdiss task lists(different from regular OCA task lists) for NON-DARK materials. ​ 
-    * Choose OCAdissdark for DARK materials. ​ Use RDA version for UM records, nonRDA version for DP records. The task list will remove OCA references for DARK materials. ​ The task list will also add:  
-      * 500=$aAvailable online through ScholarWorks@UMass Amherst.$5AUM, ​ 
-      * 530=$aPrint copy also available. ​ 
-      * 533=$aElectronic reproduction.$bAmherst,​ Mass.$cUniversity of Massachusetts Amherst,​$d[date].$nScanned as part of the Retrospective Dissertation/​Thesis Digitization Project by the UMass Amherst Libraries.$nAvailable in PDF.$5AUM. ​ 
-      * 538=$aMode of access: World Wide Web.$5AUM. the  
-      * For UM (RDA): ​ 710 2_$aUniversity of Massachusetts Amherst.$bLibraries,​$eissuing body. Retain 090, 910 an 
-      * For UM (RDA): ​ 337, 338 (for electronic records) are edited/​added. ​ 347 is added. ​   
-  - Save file and check whether the tasks performed correctly. ​ //​NOTE: ​ These changes are for preparing the records for upload into Connexion. ​ The Scholarworks versions will be much more brief.// 
-  - Manual checks/​edits:  ​ 
-    * Find/​replace RDA 26431 (entered in task list) to 264\1.  ​ 
-    * Find/​replace 264\4 information to read:  ©date (if task list didn’t add this correctly)!  ​ 
-    * 533:  Update task list date for 533 when needed.  ​ 
-    * 008:  Download to a spreadsheet and check for bad characters (at the end, byte 38 needs to be \ and 39 needs to be d).  Bytes 23-25 should be obm when 504 is present in bib.  Search for 504; if more records have it than not, enter obm and manually change to om\ when 504 is not present. ​ Or vice-versa. ​ 
-    * 300:  Fix punctuation in 300 fields; remove “Append.”  ​ 
-    * Search and replace “leaf and leaves” with “page and pages,” in 300’s and 540’s.  ​ 
-    * Search 500 Typescript and delete. 
-===== Insert the Internet Archive URLs from the pick list ===== 
-  - Add [old] OCLC#’s to the pick list, via download of 035’s from the bibnos file.  **NOTE RE duplicate Sys#'​s:​** ​ Shouldn’t be any journals here, thus no duplicate Sys#’s in original working copy; so one OCLC# per record; see duplicate check above. ​ Occasionally there will be multiple volumes in the dissertations. ​ **CAUTION:​** ​ ETDs with multiple volumes are quite often scanned as one piece. ​ Check the Internet Archive (search by URL), and fix the record to reflect scan by replacing 2 v. with pages information. ​ If scanned as two vols. with two URLs, **use only the URL for the first volume!** ​ Keep track of the ETD and fix after ScholarWorks upload. ​ See [[Batch Uploading ETDs to ScholarWorks]],​ __Combining 2-part ETDs in Scholarworks__. 
-  - **IMPORTANT:​** ​ SORT FIELDS NOW!  With the “mb” document open in MarcEditor, click Tools/Sort by …/Sort All Fields. ​ If the URL has already been added, it will get out of order. 
-  - Inserting the URL:  Merge method (for long lists): ​ **NOTE: MarcEdit updates will change details in the following, tweak as needed.** 
-    * Create a um_[date]merge spreadsheet,​ using the "​pipe"​ trick if necessary, with the headings: ​ 776$w (the [old] OCLC# in the following format: (OCoLC)########​),​ *856$u41 (the IA URL).  Save as "tab delimited." ​ //​NOTE: ​ Any OCLC number with less than eight digits will need 0's added in front, to add up to 8.  Nine digits are fine.// 
-    * Go to MarcEdit\Delimited Text Translator (in body of window). ​ Browse for the correct merge.txt file, and open it to enter into the top window. ​ Copy and replace .txt with .mrk for the output file.  Hit Next. 
-    * In following window, remove checks from Sort Fields and Calculate common nofiling data. Click "Auto Generate."​ This will pop the fields into the Arguments window. Hit Finish. 
-    * Add.mrk to the mb file we've been working on.  Go to MarcEdit\Tools (in toolbar)\Merge records. Add to window: ​ Source File=mb file with .mrk appended. ​ Merge file=merge.mrk created above. Copy Source File into "Save File" window. //​NOTE: ​ MarcEdit creates a handy backup (.bak file) in place something goofs.// ​ Record identifier: ​ Type in the 776 over the 001 that appears there. (This is the "​hook"​ which joins the information.) ​ **IMPORTANT:​** ​ Add the subfield. ​ Hit Next. 
-    * In following window, choose "Merge selected fields." ​ Hit Next. 
-    * In following window, either type or push 856 over into the Merge Fields box.  (This is the field we want to insert into the mb document.) 
-  - "​Manual Method"​ (for a short list):  ​ 
-    * Construct the URLs in an Excel sheet, using the combine columns function, (=A1&""&​B1$""​C1),​ as follows: ​ =85641 ​  ​$uhttp:​archive.org/​details/​[url]. ​ //NOTE 1:  We don't need the subfield z linking note for the IA URLS! NOTE 2:  If using the equal sign, precede it with a quote mark and eliminate quote with "Text to Columns"​ function.// Save as um_[date]URL.txt.  ​ 
-    * Line up this um_[date]URL.txt document in the same order as the working document, by adding the column of Aleph bibnos and sorting it to match the working document. ​ Then add URL by alt-tabbing between the URL document and the mb document in MarcEditor. ​ Make sure the OCLC number matches the 776 (Original) field!  ​ 
-  - Tweak the mb file:  ​ 
-    * Make sure 2 spaces are between all 856s and indicators 41; search by 856space41, and 856spacespacespace41.  ​ 
-    * Check 690s to be sure all have Departmental information.  ​ 
-    * URL subfield z:  For OCA and most Scholarworks theses/​dissertations,​ use: Link to free resource. For “Opt-Outs” (campus only), use: UMass: Link to resource. 
-  - Create the XML file(s) for conversion to ScholarWorks database:  ​ 
-    * Copy the mb files and rename them to um_[date] scholmasters_outmb,​ and scolphd_outmb.  ​ 
-    * Run the task list SCHOLAR on them.  This will produce much shorter versions of the records. ​   
-    * MarcMake the files, replacing mb with mm.  ​ 
-    * Convert to MARC21XML, replacing mm with xml.  **NOTE: Upload short versions to ScholarWorks here!** 
- ===== Upload file of longer-version records into Connexion ===== 
-  - Complete this step after the upload to ScholarWorks has been completed: [[Batch Uploading ETDs to ScholarWorks]]. 
-  - Use the spreadsheet into which the SW URLs have been aligned with the proper 776s, described in __Generate spreadsheet including SW URLs for e-conversions__,​ in //Batch Uploading ETDs to ScholarWorks//​. 
-  - Delete the Internet Archive URLs from the long-version mb file. 
-  - Insert the ScholarWorks URLs into the mb file, using the Merge records function described above. 
-  - Upload into Connexion Local Save File (set as default). Validate and Update the records. 
-  - Go to Tools/​Options/​Export,​ Highlight File (prompt for file name), click Apply, and Close. 
-  - Search for the default Local Save File, highlight all, and hit Action/​Export. Connexion will ask were to put the output file (to personal folder) and what to name it.  It will export as a .dat file, in MARC format. **NOTE: Sometimes the .dat extension will interfere with subsequent processing; if this happens, delete the .dat or remove the period** 
-===== Download completed file into Aleph ===== 
-  - MarcBreak .dat file through MarcEdit, and append mb.  Before we do anything else to this file, we need to delete the 035s, because these files also have the OCLC# in an 001 field. ​ Go to MarcEditor/​Tools,​ Add/Delete Field. ​ When the Add/Delete Field Utility window appears, type in 035, and click “Delete Field.” ​ Save the file. 
-  - MarcMake the file.  When renaming, get rid of the .dat, and call the resulting file:  um_[date]oclcmm. ​ Copy to FCL01\scratch. 
-  - Work the Services loading procedure on the file: Load Catalog Records/​Convert MARC Records Step 1 (file-01); Convert MARC Records Step 2 (file-02); Fix Catalog Records (manage-37) with *Input File type=ALEPH Sequential, *Fix Route=UMFIX,​ *Update Dtabase=NO; Check Input File Against Database (manage-36) with Match Section=UM35. ​ //NOTE: Since these are new records, there won't be any overlays, but Mike A. advised me that the best habit is to go through these procedures while loading.// ​ The 3 output files from Manage-36 should show the number of records and two 0's. 
-  - Download original file (um_[date]oclcmm) using "the OCLC loader"​ (Load Catalog Records/​Load OCLC Records (file-93)). 
-    * *Fix Routine=none;​ Match Section=OCLC;​ *Merge Routine=OCLC;​ Produce Loading Report=YES. ​ This loading report will be used for the final tasks. 
-    * In Aleph, Go to Task Manager/​File List. In bottom window, under Remote Name, highlight the name of the task just run, and after making sure Preview shows at the bottom under "Print Configuration,"​ click Print at top right. ​ (It won't actually print if Preview is selected.) ​ Copy the resulting table (using Control-A and Control-C) to an Excel spreadsheet,​ and create a Services-ready list of the newly created bibnos. 
-  - Create Holdings and Items. ​ This function uses the 949 entered into the records. 
-    * Go to Services/​Load Catalog Records/​Create Holdings and Item Records Using Bibliographic Data (Manage-50). ​ ADM Library=UMA50;​ HOL Library=FCL60;​ Main Field=949#​1;​ *Item/​Holdings Creation table=tab_hol_item_create_umoca. ​ //​NOTE: ​ These creation tables are found in WinSCP: U19_1/​FCL01/​tab/​import/​ //  For Holdings/​Item Creation Mode, Use "​Replace existing holdings and items records";​ for Holdings/​Item Creation Interation, "​Create holdings records based only on new items";​ *ADM Cataloger=MANAGE-50;​ ADM Cataloging Level=20; HOL Cataloger=MANAGE-50;​ HOL Cataloging Level=20. ​ Select Yes for *Update Database and *Check Item Records. 
-  - Delete 856s from the bibs with Global Changes (manage-21). ​ 
-===== Add 530s into Print Records ===== 
-  - **IMPORTANT:​** ​ Because many backlog ETDs have duplicate copies in the Depository, construct a Services-ready list of bib. numbers obtained through a Ret-06 (Direct Index) search on the OCLC numbers, with OCL in the Search Index field. Using this list will enter the following notes into both UM and DP copies. 
-  - For OPEN (non-DARK) theses/​dissertations with both IA and Scholarworks URLs in the final versions, add:  **Also available online through Scholarworks@UMass Amherst and the Internet Archive.** 
-  - For DARK theses/​dissertations with Scholarworks URLs, add: **Also available online through Scholarworks@UMass Amherst.** 
-===== Update ETD Project Tracking-revised spreadsheet ===== 
-  - Found in W:\ETD Digitization Project folder 
-  - Best to do this in stages, during processing. 
-//​[[lucyd@library.umass.edu|Primary contact: Lucy deGozzaldi]]//​. 
batch_conversion_of_etds_to_e-records_with_scholarworks_urls.txt · Last modified: 2019/01/07 12:20 (external edit)
[unknown link type]Back to top
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0