Shadowcat

Shadowcat is a delayed mirror of our III Sierra catalog, facilitating programmatic access to core bibliographic data: Bib and Item records.

The service provides three functions:

  • Ingest source for catalog data into the registry.
  • Storage system for linked data about our catalog records (via OCLC classify, VIAF, etc)
  • API access for catalog data.

The service mirrors the catalog through the Sierra API v 1.0 ( https://sandbox.iii.com/docs/v1/Default.htm ). Every night the API is scraped for updated bib records and new item records and stored into a mongoDB instance.

Work Plan:

  • Develop jobs to keep the data store updated via the Sierra API  (https://github.com/thisismattmiller/shadowcat-data)
    • Jobs for Bib and Item records (Complete)
    • Job to look for missing Item records and update ad hoc (TODO)
    • Job to flag a resource as a "research" holding based on item location and other logic (Complete)
  • Linked Data Jobs (https://github.com/thisismattmiller/shadowcat-lod)
    • Bulk ingest job to associate the 34GB Classify data I scraped in June and associate them to their correct Bib records in shadowcat (Complete)
    • Update job that will spawn workers to keep new records and check old records for their classify results (Complete)
    • Normalize Classify LCC numbers (Complete)
    • Update job that ingests OCLC linked data services for anything not in Classify (Complete)
  • API
  • Dashboard (nice to have) (TODO) 

 

Data Model:

Data arrives from multiple sources (Sierra API, OCLC services, computational sources). In order to keep track of the provenance of each field they are prefixed with their source, except for the Sierra API fields which retain their original key name to facilitate the nightly update from the API.

Sierra Bib Record Fields:

Field
Description / Subfield
Data Type 
ID
Bnumber, a 8 digit id, (the 9th digit is a check number)
Only the first 8 are needed to find a record
Integer (8 positions)
No "b" prefix, Indexed
updatedDate
Date the records was changed in Sierra
Full date string
2015-05-08T12:22:41Z
 
createdDate
Date the record was created in Sierra
Full date string
2015-05-08T12:22:41Z
 
deleted
Flag if this record was deleted, will always be false.
Boolean
 
suppressed
Flag, if true will disable the record from 
public view (often missing/damaged materials are suppressed)
Boolean
 
lang
Language of material based on MARC fields 
(logic unknown, meaning not sure what
fields/subgfields are checked/combined to make this)
String
 
title
Title of the material based on MARC fields (logic unknown)
String
 
author
Author of the material based on MARC fields (logic unknown)
String
 
materialType
Type of the material
Object
List of possible codes:
material_types.json
 
code  - the alphanumeric material type code 
String
 
 
value - the display label
String
 
bibLevel
The bib level of the material
Object
List of possible codes:
bib_level.json
 
code  - the alphanumeric material bib level code 
String
 
 
value - the display label
String
 
publishYear
Year published (logic unknown)
Integer (4 number year)
 
catalogDate
Date cataloged (logic unknown)
YYYY-MM-DD String
 
country
Country or state of the material (logic unknown)
String
 
fixedFields
Sierra specific fields. Subfields listed below
Object
Example: fixed_fields.json
 
24 - Language
Object 
(label, value, display)
 
 
25 - Skip
Object (label, value)
No idea
 
26 - Location
Object
(label, value, display)
 
The location of the material
If it is in multiple locations
it will just say "multi". The linked
Item record is much better to use
for location.
 
27 - Copies 
Object (label, value)
#of copies
 
28 - Cat Date
Object (label, value)
Date of cataloging value is in
YYYY-MM-DD String
 
29 - Bib Level
Object 
(label, value, display)
Same as root level
 
30 - Material Type 
Object 
(label, value, display)
Same as root level
 
31 - Bib Code 3
Object (label, value)
Possible Codes: bib_code_3.json
 
80 - Record Type
Object (label, value)
Not sure of the codes meaning
 
81 - Record Number
Object (label, value)
bnumber
 
83 - Created Date
Object (label, value)
Same as root level
 
84 - Updated Date
Object (label, value)
Same as root level
 
85 - No. of Revisions
Object (label, value)
Revisions to the record?
 
86 - Agency
Object (label, value)
Not sure of the codes meaning
 
89 - Country
Object 
(label, value, display)
Same as root, but has 
code (value) as well
 
98 - PDATE
Object (label, value)
Not sure, purchased date?
full date string 
2015-07-11T05:36:48Z 
 
107 - MARC Type
Object (label, value)
Not sure
varFields
This holds a JSON representation of the record's MARC fields
http://www.loc.gov/marc/bibliographic/ 
Complex Object
Lots of variation of what 
fields/subfields/indicators
are present. Look at
the examples below for
more info. 
 
Complete examples of Bib record fields:
Archives example: bib_example_archives.json 
Research Book example: bib_example_research_book.json 
Brach Book example: bib_example_branch_book.json 
Other examples: ? 
  


Shadowcat BIB Fields Record Fields:

These are fields that were generated via operations taken on existing catalog data that resulted in a new field

Field
Description / Subfield
Data Type 
sc:oclc
OCLC Worldcat IDs extracted from various MARC locations
Array of Integers
Indexed
sc:research
A flag to mark if a bib record is considered a research resource
Boolean
Indexed, Process outlined below:
sc:classifyCheck
Date when the record was lasted checked in OCLC Classify
String YYYY-MM-DD
 
sc:worldcatCheck
Date when the record was lasted checked in OCLC Worldcat
String YYYY-MM-DD
 
sc:isbn
ISBN Ids extracted from the MARC
Array of Strings
Indexed
sc:issn
ISSN Ids extracted from the MARC
Array of Strings
Indexed
sc:hathi
The hathi volume ids linked to this resource
Array of strings
Indexed
sc:lccCoarse
The results of a process to convert a 
LCC number to a broader level classification.
https://github.com/thisismattmiller/lcc-pdf-to-json 
String or False
Indexed.
Values:
lcc_outlines.json 
sc:usageCount
Tallied number of Checkouts and Renewals from the Items table
Integer
-1 mean there was no item record, so unknown usage count. 
Current data from Summer 2015
sc:publishYear
If the publishYear on the bib record is not a 4 digit year it will try to populate this field
with the likely publish date following the logic to the right. 
Integer
Pull MARC Fields:
362 $a (Dates of Publication)
260 $c (Publication, Distribution, etc.)
852 $z (Callnumber)
Look in each field for 19/20/21 century years that match:
/\D19[0-9]{2}\D/
/^19[0-9]{2}\D/
/^.*\D19[0-9]{2}$/
/^19[0-9]{2}$/
Take the lowest year and use that as the sc:publishYear
If no years were found look into the items 
records for the varFields with the fieldTag 'a'
and do the same process.
sc:classmark
The base (minus cutter, etc) of the any of the classmarks found in the 852 field.
The classmark is lower cased and any period punctuation (only periods) are removed.
Array of strings
Indexed
sc:callnumber
The full call number including item records.
Lowercased and period punctuation (only) removed.
Array of strings
Indexed
sc:localOn some records that are local resources, these are bib records flagged previously not for oclc reclamation project or ones that have a clippings call number, such as *T-CLPBoolean 
sc:agentsAn array of objects that combines any classify and worldcat and MARC name fields.Array of Obj 
sc:termsAn array of objects that represents the combined FAST and local subject terms mapped togetherArray of Obj 


Library of Congress BIB Fields Record Fields:

These are fields that were added via the LC APIs

FieldDescriptionData Type 
lc:lccLibrary of Congress classification number taken via the LCCN endpointString 
lc:dccDewey decimal classficiation taken via the LCCN endpointFloat 
lc:oclcOCLC Number taken via LCCN endpointArray of integers 

 


OCLC BIB Fields Record Fields:

These are fields that were added via the OCLC Classify API and OCLC Worldcat LOD services

Field
Description / Subfield
Data Type 
classify:oclc 
OCLC Worldcat IDs returned for this resource by Classify API
Array of Integers 
Indexed 
classify:editions
The number of editions for this work
Integer
 
classify:eholdings
The number of electronic holdings from aggregated OCLC data
Integer
 
classify:format
OCLC's Classify format type
String
 
classify:holdings
The holdings count from aggregated OCLC data
Integer
 
classify:itemtype
OCLC's Classify item type
String
 
classify:owi
The Work identifier
Integer
Indexed
classify:creatorLC
The LC identifiers for the resource creator
Array of String
Indexed
classify:creatorVIAF
The VIAF identifiers for the resource creator
Array of Integers
Indexed
classify:fast
The Fast subject headings for this resource
Array of Objects
Indexed
 
id - The FAST id
Integer
 
 
label - The text label for the FAST subject
String
 
classify:dcc
The Dewey classification for this resource
Float
Indexed
classify:lcc
The LCC classification for this resource
String
Indexed
classify:closeMatch
An object containing results from the 
Classify API that might match to this resource.
Used for disambiguation. 
Array of Objects
 
worldcat:contributors
VIAF IDS of contributors to the resource
Array of Integers
 

 

 

Sierra Item Record Fields:

Item records are connected conceptually  to the bib record via the bnumber stored in the item record. 

Field
Description / Subfield
Data Type 
ID
Item number
Integer (8 positions)
 
updatedDate
Date the records was changed in Sierra
Full date string 
2015-05-08T12:22:41Z
 
createdDate
Date the record was created in Sierra
Full date string 
2015-05-08T12:22:41Z
 
deleted
Flag if this record was deleted, will always be false.
Boolean
 
bibIds
The Bnumbers of the Bib record this item belongs to
Array of Integers
Indexed
location
The location of the item. Refer to this document:
Live: Google Doc
Archived July 2015:  Location Codes.csv
Object
 
 
code  - the alphanumeric location code 
String
 
 
name - the display label
String
 
status
The status of the item
Object
 
 
code  - the alphanumeric status code 
String
 
 
display - the display label
String
 
barcode
The barcode of the item
14 Digit Integer (always 14?)
Indexed
callNumber
Call number / Classmark of this item, field has piped delimiters 
String
 
fixedFields
Various Sierra fields.
Object
 
 
57 - BIB HOLD
Object (label, value)
 
 
58 - Copy No.
Object (label, value)
 
 
59 - Item Code 1
Object (label, value)
 
 
60 - Item Code 2
Object (label, value)
 
 
61 - Item Type
Lookup table: HERE
Object (label, value)
 
 
62 - Price
Object (label, value)
 
 
64 - Checkout Location
Object 
Integer
 
 
68 - Last Checkin
Object 
String Full date string
 
 
69 - Inventory Date
Object 
String Full date string
 
 
70 - Checkin Location
Object Integer
 
 
74 - Item Use 3
Object / Integer
 
 
76 - Total Checkouts
Object / Integer
 
 
77 - Total Renewals
Object / Integer
 
 
78 - Last Checkout Date
Object
String Full date string
 
 
79 - Location
Object / String
 
 
80 - Record Type
Object / String
 
 
81 - Record Number
Object / String
 
 
83 - Created Date
Object 
String Full date string
 
 
84 - Updated Date
Object
String Full date string
 
 
85 - No. of Revisions
Object / String
 
 
86 - Agency
Object / String
 
 
88 - Status
Object / String
 
 
93 - Internal Use
Object / Integer
 
 
94 - Copy Use
Object / Integer
 
 
97 - Item Message
Object / String
 
 
98 - PDATE
Object
String Full date string
 
 
108 - OPAC Message
Object / String
 
 
109 - Year-to-Date Circ
Object / Integer
 
 
110 - Last Year Circ
Object / Integer
 
 
127 - Item Agency
Object / String
 
 
161 - VI Central
Object / Integer
 
 
162 - IR Dist Learn Same Site
Object / String
 
 
264 - Holdings Item Tag
Object / String
 
 
265 - Inherit Location
Object / Boolean
 
varFields
JSON representation of the MARC item data.
Various fields/subfields/indicators.
Examples:
Branch Book:  item_branch.json
Research Book: item_research.json
Archives: item_archives.json 
  

 

API Documentation:

TODO

Server Config Documentation:

TODO

 

MongoDB Indexes:

db.item.createIndex({ "bibIds" : 1 }, {background: true} )
db.item.createIndex({ "barcode" : 1 }, {background: true} )
db.bib.createIndex({ "sc:oclc" : 1 }, {background: true} )
db.bib.createIndex({ "sc:isbn" : 1 }, {background: true} )
db.bib.createIndex({ "sc:issn" : 1 }, {background: true} )
db.bib.createIndex({ "sc:hathi" : 1 }, {background: true} )
db.bib.createIndex({ "classify:oclc" : 1 }, {background: true} )
db.bib.createIndex({ "classify:owi" : 1 }, {background: true} )
db.bib.createIndex({ "classify:creatorLC" : 1 }, {background: true} )
db.bib.createIndex({ "sc:research" : 1 }, {background: true} )
db.bib.createIndex({ "classify:creatorVIAF" : 1 }, {background: true} )
db.bib.createIndex({ "classify:dcc" : 1 }, {background: true} )
db.bib.createIndex({ "classify:lcc" : 1 }, {background: true} )
db.bib.createIndex({ "sc:lccCoarse" : 1 }, {background: true} )
db.bib.createIndex({ "sc:callnumber" : 1 }, {background: true} )