Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
batch_uploading_etds_to_scholarworks [2017/02/01 17:46]
ldegozzaldi
batch_uploading_etds_to_scholarworks [2022/05/16 18:53] (current)
jeustis
Line 1: Line 1:
-====== Batch Uploading ETDs to ScholarWorks ======+====== ​PAGE OUTDATED ARCHIVED ​Batch Uploading ETDs to ScholarWorks ======
  
 **Link to ETD Workflow Google doc:** **Link to ETD Workflow Google doc:**
  
-https://​docs.google.com/​document/​d/​1Qs_uIWPjDfUIcQhxJiMlgvgN8XpexZn2CTs0eSGTOxo/edit +https://​docs.google.com/​document/​d/​1MWz76tt3SzTEvXZGQ1nke7QSm2erd1WuPLWb1_79r44/edit?​usp=sharing ​
  
 ===== Transform MARC.XML file of short-version bibs.+IA URLs to Bepress XML: ===== ===== Transform MARC.XML file of short-version bibs.+IA URLs to Bepress XML: =====
Line 21: Line 21:
   =502  \\$aThesis (M.S.)--Massachusetts Agricultural College, 1911.   =502  \\$aThesis (M.S.)--Massachusetts Agricultural College, 1911.
   =650  \0$aCorn$zMassachusetts.   =650  \0$aCorn$zMassachusetts.
-  =690  ​\\$aTheses$xHorticulture$xMasters.+  =657 _7 \\$aTheses$x[Program]$xMasters$2local/mu
   =776  $w(OCoLC)18237440   =776  $w(OCoLC)18237440
   =856  41 %%$uhttp://​archive.org/​details/​typesofcornsuite00smit%%   =856  41 %%$uhttp://​archive.org/​details/​typesofcornsuite00smit%%
Line 35: Line 35:
  
 This should enter a spreadsheet into the metadata 2 folder: ​ bpimport_date. ​ RENAME with the date and Masters/​PhD,​ and save to personal folder. This should enter a spreadsheet into the metadata 2 folder: ​ bpimport_date. ​ RENAME with the date and Masters/​PhD,​ and save to personal folder.
 +
 +Aaron'​s script is written in Python (Compiled in windows binary so you can't edit that file.  Need to go back to original source, edit, and recompile. ​ Use active python - packaged open source.)  ​
  
 ===== Make adjustments to Bepress spreadsheet ===== ===== Make adjustments to Bepress spreadsheet =====
Line 44: Line 46:
   - PhD’s: ​ All other entries will have “dissertation” in this column.   - PhD’s: ​ All other entries will have “dissertation” in this column.
   - Masters: ​ Correct the Department column heading to degree_prog   - Masters: ​ Correct the Department column heading to degree_prog
-  - Keyword fix:  Go to scholmasters or scolphd file in MarcEdit and download 600's, 610's, 650’s and 651’s with all subheadings to an Excel spreadsheet. ​ Combine lines per record, remove periods, $x, $y, $z.  Remove comma between Subject headings and Subdivisions,​ retain commas between separate entries only.  Remove parentheses. ​ Run Compare line differences using the 776 value to be sure MarcEdit/​bpimport matches, then paste new Keyword entries into the bpimport spreadsheet.  ​+  - Keyword fix:  Go to scholmasters or scolphd file in MarcEdit and download 600's, 610's, 650’s and 651’s with all subheadings to an Excel spreadsheet. ​ Combine lines per record, remove periods, $x, $y, $z.  Remove comma between Subject headings and Subdivisions,​ retain commas between separate entries only.  Remove parentheses. ​ Run Compare line differences using the 776 value to be sure MarcEdit/​bpimport matches, then paste new Keyword entries into the bpimport spreadsheet. 
 +  - NOTE: The program name in the cataloging record must be that which is provided by the Graduate Office on the spreadsheet. Do not use any variant form or department listed on the title page. A copy of the Degree Programs and their codes can be found in [[Graduate Degree Program Codes]]. ​  
   - **NOTE: ​ The final file needs to be saved (and uploaded) as “Excel 97-2003 Worksheet”!!!!**   - **NOTE: ​ The final file needs to be saved (and uploaded) as “Excel 97-2003 Worksheet”!!!!**
  
Line 89: Line 92:
 ===== Generate spreadsheet including SW URLs for e-conversions ===== ===== Generate spreadsheet including SW URLs for e-conversions =====
  
-  - From the Scholarworks ​My Account ​page, click Manage ​Theses/Dissertations ​under the appropriate series. ​  +  - From the Scholarworks ​__My Account__ ​page, click __Manage ​Theses/Dissertations__ ​under the appropriate series. ​  
-  - Click Batch revise Excel (on left margin), then click “Generate” in the box under “To revise content via an Excel spreadsheet." ​ This will take a while, but when done, will add the downloaded file under today’s date, in the list at the bottom. ​ Click Download and it will automatically open it in Excel. +  - Click Batch revise Excel (on left margin), then click “Generate” in the box under “To revise content via an Excel spreadsheet." ​ This will take a while, but when done, will add the downloaded file under today’s date, in the list at the bottom. ​ Click Download and it will automatically open in Excel. 
-  - Save this generated spreadsheet in personal folder under a new name (i.e., [date,etc.]phd(or masters)URLs776), and delete all columns except for title, author OCLC# and URL. **NOTE: ​ Scholarworks URLs are generated in order; check the bottom of the generated spreadsheet to find the matching entries to the bpimport spreadsheet-in-process.** Copy the column of OCLC#'​s from the bpimport spreadsheet into the generated spreadsheet and use //compare row differences//​ to verify accuracy. ​ Delete all non-matching rows.  This produces the list needed to proceed to: [[Batch Conversion of ETDs to e-records with Scholarworks URLs]], __Upload file of longer-version records into Connexion__.+  - Save this generated spreadsheet in personal folder under a new name (include ​date, etc.), and delete all columns except for title, authorOCLC# and SW URL.  
 +  - **NOTE: ​ Scholarworks URLs are generated in order; check the bottom of the generated spreadsheet to find the matching entries to the bpimport spreadsheet-in-process.** Copy the column of OCLC#'​s from the bpimport spreadsheet into the generated spreadsheet and use //compare row differences//​ to verify accuracy. ​ Delete all non-matching rows.  This produces the list needed to proceed to: [[Batch Conversion of ETDs to e-records with Scholarworks URLs]], __Upload file of longer-version records into Connexion__.
  
-===== Notify Meghan, ​Lisa and Jessica Adamick ​=====+===== Notify Meghan, ​Erin Jerome ​and Sue Mellin ​=====
  
-  - Create an "​upload report"​ from the bpimport spreadsheet with Title, URL (pasted in from the previous URLs776 spreadsheet), Authors, M/P, and Program included.+  - Create an "​upload report"​ from the bpimport spreadsheet with Title, URL, Authors, M/P, and Program included.
   - Send an email notification with upload report spreadsheet attached.   - Send an email notification with upload report spreadsheet attached.
 ===== Troubleshooting notes: ===== ===== Troubleshooting notes: =====
Line 109: Line 113:
 ====== Adding “OPT-OUTS” TO Scholarworks Metadata Spreadsheets ====== ====== Adding “OPT-OUTS” TO Scholarworks Metadata Spreadsheets ======
  
-**NOTE: ​ The Opt-in/​Opt-out information will be found in the ETD Master File, saved on the W: Drive under ETD Digitization Project, in Sheet 2, "all titles,"​ Column R, "​Authors Permissions Response: Opt-in, Opt-out, no response."​** ​+**NOTE:**  The Opt-in/​Opt-out information will be found in the ETD Master File, saved on the W: Drive under ETD Digitization Project, in Sheet 2, "all titles,"​ Column R, "​Authors Permissions Response: Opt-in, Opt-out, no response."​ 
 + 
 +**NOTE 2:**  Because the number of Opt-outs is small compared to the total number of ETD's, don't use the V-LOOKUP to match to the bpimport file in process.  ​
  
-  - Make a new copy of Sheet 2, retaining only columns R (Opt-ins/​Opt-outs) and I (Advance ID). Make sure the data is connected, with no blank columns between.  +**NOTE ​3:**  Opt-outs ​are updated ​by Department, in response ​to letters received ​from the authors.  ​We should be informed ​when these letters are received.  If the new Opt-outs are not already sorted out for usproceed as follows: ​ 
-  - Make a new copy of Sheet 1 entitled "all items,"​ retaining only columns E (Bib Nbr), F (OCLC Nbr) and P (Advance ID), making sure the data is connected. +
-  - Combine the two copies with blank columns between. ​ Sort by Advance ID and compare row differences. ​ When match is established,​ delete 1 column of Advance ID information,​ and connect the rest of the columns. ​  +
-  - Sort by R, and delete all rows which do not contain "​Opt-out" ​**NOTE: ​To prevent Excel from freezing up, delete as much extraneous information as possible, before working any function which requires Paste Values.** +
-  - Produce a list of 776's from the bpimport spreadsheets for Masters and PhDs. Copy to the right side of the worksheet containing the Opt-outs, leaving a few blank columns between. ​ This will be the short list.  Enter "​match"​ all the way down the column to right of the short list, and work a VLOOKUP, comparing to the long OCLC list. +
-  - VLOOKUP breakdown: Place cursor in the cell just to right of the block of information on the right side of the spreadsheet,​ and enter the formula, as follows: ​ =VLOOKUP(f2:​f274,​$i$2:​$j$70,​2,​false),​ using either upper or lower case.  Explanation: ​ f2=cell of first entry in large list (numbered ​by column and row) (could be ad, e, m, whatever), f274=cell of last entry in the list.  This is the target list being scanned for matches. ​ $i$2=first cell of small list, $j$70=last cell of "​match"​ column ​to right of small list.  "​False"​ indicates that the large list is being tagged with information NOT from the small list (but from an adjacent column), and "​2"​ indicates how faraway that column is.  ​In this case, the word "​match"​ will appear in place of the formula ​when it is entered in a row with an OCLC match.  If no match is found, #N/A will appear. +
-  - Highlight ​the column in which VLOOKUP was entered, hit Control-C (copy) and under File, click Paste/Paste Values/​123. ​ This will remove the formula from the data.  Sort by that column. ​ If any matches ​are presentthey should appear at the top, indicating any OCLC numbers in the current bpimport spreadsheets which have Opt-outs associated with them.  +
  
-**NOTE:  ​Format columns ​to be matched in the VLOOKUP ​as “General” or else it won'​t ​work.**  +  - Make a new copy of the Master File Sheet 2 (all titles) of the ETD Master File, retaining only columns R (Opt-ins/​Opt-outs),​ A or J (Author), B (Department),​ C (Year), D (Masters/​PhDs) and title (F).  Make sure all data is connected, with no blank columns between, and sort on column R. 
 +  - Retain only the rows with Opt-out in Column R.  
 +  - After sorting on R, delete all rows which do not contain "​Opt-out." ​**Transfer any __new__ Opt-out information to ETD Opt-out Tracking sheet.xlsx, also present in the WDrive in the same location as the Master File.** ​ (Hint: Sort by Department ​to isolate ​the new Opt-outs, ​as reported to us.  Run a duplicate check on the ETD Out-out Tracking sheet, to be sure we don'​t ​already have any of this added information.
 +  - When the ETD's on the Opt-out list are in process, enter this information on the ETD Opt-out Tracking sheet, together with date. 
 +  - When the ETD's have been uploaded to ScholarWorks and uploaded and downloaded to Connexion and Aleph, enter DONE on the ETD Opt-out Tracking sheet, together with date.
  
 + 
  
 //​[[lucyd@library.umass.edu|Primary contacts: Lucy deGozzaldi]]//,  ​ //​[[lucyd@library.umass.edu|Primary contacts: Lucy deGozzaldi]]//,  ​
 //​[[mbergin@library.umass.edu| Meghan Bergin]]//. //​[[mbergin@library.umass.edu| Meghan Bergin]]//.
  
batch_uploading_etds_to_scholarworks.1485971168.txt.gz · Last modified: 2019/01/07 17:20 (external edit)
[unknown link type]Back to top
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0