Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Develop jobs to keep the data store updated via the Sierra API  (https://github.com/thisismattmiller/shadowcat-data)
    • Jobs for Bib and Item records (Complete)
    • Job to look for missing Item records and update ad hoc (TODO)
    • Job to flag a resource as a "research" holding based on item location and other logic (Complete)
  • Linked Data Jobs (https://github.com/thisismattmiller/shadowcat-lod)
    • Bulk ingest job to associate the 34GB Classify data I scraped in June and associate them to their correct Bib records in shadowcat (Complete)
    • Update job that will spawn workers to keep new records and check old records for their classify results (Mostly written, refactorComplete)
    • Normalize Classify LCC numbers (Complete)
    • Update job that ingests OCLC linked data services for anything not in Classify (TODOComplete)
  • API
  • Dashboard (nice to have) (TODO) 

...

Field
Description / Subfield
Data Type 
sc:oclc
OCLC Worldcat IDs extracted from various MARC locations
Array of Integers
Indexed
sc:research
A flag to mark if a bib record is considered a research resource
Boolean
Indexed, Process outlined below:
sc:classifyCheck
Date when the record was lasted checked in OCLC Classify
String YYYY-MM-DD
 
sc:worldcatCheck
Date when the record was lasted checked in OCLC Worldcat
String YYYY-MM-DD
 
sc:isbn
ISBN Ids extracted from the MARC
Array of Strings
Indexed
sc:issn
ISSN Ids extracted from the MARC
Array of Strings
Indexed
sc:hathi
The hathi volume ids linked to this resource
Array of strings
Indexed
sc:lccCoarse
The results of a process to convert a 
LCC number to a broader level classification.
https://github.com/thisismattmiller/lcc-pdf-to-json 
String or False
Indexed.
Values:
lcc_outlines.json 
sc:usageCount
Tallied number of Checkouts and Renewals from the Items table
Integer
-1 mean there was no item record, so unknown usage count. 
Current data from Summer 2015
sc:publishYear
If the publishYear on the bib record is not a 4 digit year it will try to populate this field
with the likely publish date following the logic to the right. 
Integer
Pull MARC Fields:
362 $a (Dates of Publication)
260 $c (Publication, Distribution, etc.)
852 $z (Callnumber)
Look in each field for 19/20/21 century years that match:
/\D19[0-9]{2}\D/
/^19[0-9]{2}\D/
/^.*\D19[0-9]{2}$/
/^19[0-9]{2}$/
Take the lowest year and use that as the sc:publishYear
If no years were found look into the items 
records for the varFields with the fieldTag 'a'
and do the same process.
sc:classmark
The base (minus cutter, etc) of the any of the classmarks found in the 852 field.
The classmark is lower cased and any period punctuation (only periods) are removed.
Array of strings
Indexed
sc:callnumber
The full call number including item records.
Lowercased and period punctuation (only) removed.
Array of strings
Indexed
sc:localOn some records that are local resources, these are bib records flagged previously not for oclc reclamation project or ones that have a clippings call number, such as *T-CLPBoolean 
sc:agentsAn array of objects that combines any classify and worldcat and MARC name fields.Array of Obj 
sc:termsAn array of objects that represents the combined FAST and local subject terms mapped togetherArray of Obj 


Library of Congress BIB Fields Record Fields:

...