Shadowcat
Shadowcat is a delayed mirror of our III Sierra catalog, facilitating programmatic access to core bibliographic data: Bib and Item records.
The service provides three functions:
- Ingest source for catalog data into the registry.
- Storage system for linked data about our catalog records (via OCLC classify, VIAF, etc)
- API access for catalog data.
The service mirrors the catalog through the Sierra API v 1.0 ( https://sandbox.iii.com/docs/v1/Default.htm ). Every night the API is scraped for updated bib records and new item records and stored into a mongoDB instance.
Work Plan:
- Develop jobs to keep the data store updated via the Sierra API (https://github.com/thisismattmiller/shadowcat-data)
- Jobs for Bib and Item records (Complete)
- Job to look for missing Item records and update ad hoc (TODO)
- Job to flag a resource as a "research" holding based on item location and other logic (Complete)
- Linked Data Jobs (https://github.com/thisismattmiller/shadowcat-lod)
- Bulk ingest job to associate the 34GB Classify data I scraped in June and associate them to their correct Bib records in shadowcat (Complete)
- Update job that will spawn workers to keep new records and check old records for their classify results (Complete)
- Normalize Classify LCC numbers (Complete)
- Update job that ingests OCLC linked data services for anything not in Classify (Complete)
- API
- Build API access endpoints for the following (In Progress)
- bNumber (Complete) (http://45.55.205.23/api/bnumber/b20480002)
- OCLC
- ISBN
- ISSN
- Hathi
- VIAF
- Item Barcode
- Build API access endpoints for the following (In Progress)
- Dashboard (nice to have) (TODO)
Data Model:
Data arrives from multiple sources (Sierra API, OCLC services, computational sources). In order to keep track of the provenance of each field they are prefixed with their source, except for the Sierra API fields which retain their original key name to facilitate the nightly update from the API.
Sierra Bib Record Fields:
Field | Description / Subfield | Data Type | |
---|---|---|---|
ID | Bnumber, a 8 digit id, (the 9th digit is a check number) | Integer (8 positions) | No "b" prefix, Indexed |
updatedDate | Date the records was changed in Sierra | Full date string | |
createdDate | Date the record was created in Sierra | Full date string | |
deleted | Flag if this record was deleted, will always be false. | Boolean | |
suppressed | Flag, if true will disable the record from | Boolean | |
lang | Language of material based on MARC fields | String | |
title | Title of the material based on MARC fields (logic unknown) | String | |
author | Author of the material based on MARC fields (logic unknown) | String | |
materialType | Type of the material | Object | List of possible codes: |
code - the alphanumeric material type code | String | ||
value - the display label | String | ||
bibLevel | The bib level of the material | Object | List of possible codes: |
code - the alphanumeric material bib level code | String | ||
value - the display label | String | ||
publishYear | Year published (logic unknown) | Integer (4 number year) | |
catalogDate | Date cataloged (logic unknown) | YYYY-MM-DD String | |
country | Country or state of the material (logic unknown) | String | |
fixedFields | Sierra specific fields. Subfields listed below | Object | Example: fixed_fields.json |
24 - Language | Object | ||
25 - Skip | Object (label, value) | No idea | |
26 - Location | Object (label, value, display) | The location of the material | |
27 - Copies | Object (label, value) | #of copies | |
28 - Cat Date | Object (label, value) | Date of cataloging value is in | |
29 - Bib Level | Object | Same as root level | |
30 - Material Type | Object | Same as root level | |
31 - Bib Code 3 | Object (label, value) | Possible Codes: bib_code_3.json | |
80 - Record Type | Object (label, value) | Not sure of the codes meaning | |
81 - Record Number | Object (label, value) | bnumber | |
83 - Created Date | Object (label, value) | Same as root level | |
84 - Updated Date | Object (label, value) | Same as root level | |
85 - No. of Revisions | Object (label, value) | Revisions to the record? | |
86 - Agency | Object (label, value) | Not sure of the codes meaning | |
89 - Country | Object | Same as root, but has | |
98 - PDATE | Object (label, value) | Not sure, purchased date? full date string | |
107 - MARC Type | Object (label, value) | Not sure | |
varFields | This holds a JSON representation of the record's MARC fields | Complex Object | Lots of variation of what |
Complete examples of Bib record fields: Archives example: bib_example_archives.json Research Book example: bib_example_research_book.json Brach Book example: bib_example_branch_book.json Other examples: ? |
Shadowcat BIB Fields Record Fields:
These are fields that were generated via operations taken on existing catalog data that resulted in a new field
Field | Description / Subfield | Data Type | |
---|---|---|---|
sc:oclc | OCLC Worldcat IDs extracted from various MARC locations | Array of Integers | Indexed |
sc:research | A flag to mark if a bib record is considered a research resource | Boolean | Indexed, Process outlined below: |
sc:classifyCheck | Date when the record was lasted checked in OCLC Classify | String YYYY-MM-DD | |
sc:worldcatCheck | Date when the record was lasted checked in OCLC Worldcat | String YYYY-MM-DD | |
sc:isbn | ISBN Ids extracted from the MARC | Array of Strings | Indexed |
sc:issn | ISSN Ids extracted from the MARC | Array of Strings | Indexed |
sc:hathi | The hathi volume ids linked to this resource | Array of strings | Indexed |
sc:lccCoarse | The results of a process to convert a | String or False | Indexed. Values: lcc_outlines.json |
sc:usageCount | Tallied number of Checkouts and Renewals from the Items table | Integer | -1 mean there was no item record, so unknown usage count. Current data from Summer 2015 |
sc:publishYear | If the publishYear on the bib record is not a 4 digit year it will try to populate this field with the likely publish date following the logic to the right. | Integer | Pull MARC Fields: Look in each field for 19/20/21 century years that match: Take the lowest year and use that as the sc:publishYear If no years were found look into the items |
sc:classmark | The base (minus cutter, etc) of the any of the classmarks found in the 852 field. | Array of strings | Indexed |
sc:callnumber | The full call number including item records. | Array of strings | Indexed |
sc:local | On some records that are local resources, these are bib records flagged previously not for oclc reclamation project or ones that have a clippings call number, such as *T-CLP | Boolean | |
sc:agents | An array of objects that combines any classify and worldcat and MARC name fields. | Array of Obj | |
sc:terms | An array of objects that represents the combined FAST and local subject terms mapped together | Array of Obj |
Library of Congress BIB Fields Record Fields:
These are fields that were added via the LC APIs
Field | Description | Data Type | |
---|---|---|---|
lc:lcc | Library of Congress classification number taken via the LCCN endpoint | String | |
lc:dcc | Dewey decimal classficiation taken via the LCCN endpoint | Float | |
lc:oclc | OCLC Number taken via LCCN endpoint | Array of integers |
OCLC BIB Fields Record Fields:
These are fields that were added via the OCLC Classify API and OCLC Worldcat LOD services
Field | Description / Subfield | Data Type | |
---|---|---|---|
classify:oclc | OCLC Worldcat IDs returned for this resource by Classify API | Array of Integers | Indexed |
classify:editions | The number of editions for this work | Integer | |
classify:eholdings | The number of electronic holdings from aggregated OCLC data | Integer | |
classify:format | OCLC's Classify format type | String | |
classify:holdings | The holdings count from aggregated OCLC data | Integer | |
classify:itemtype | OCLC's Classify item type | String | |
classify:owi | The Work identifier | Integer | Indexed |
classify:creatorLC | The LC identifiers for the resource creator | Array of String | Indexed |
classify:creatorVIAF | The VIAF identifiers for the resource creator | Array of Integers | Indexed |
classify:fast | The Fast subject headings for this resource | Array of Objects | Indexed |
id - The FAST id | Integer | ||
label - The text label for the FAST subject | String | ||
classify:dcc | The Dewey classification for this resource | Float | Indexed |
classify:lcc | The LCC classification for this resource | String | Indexed |
classify:closeMatch | An object containing results from the Used for disambiguation. | Array of Objects | |
worldcat:contributors | VIAF IDS of contributors to the resource | Array of Integers |
Sierra Item Record Fields:
Item records are connected conceptually to the bib record via the bnumber stored in the item record.
Field | Description / Subfield | Data Type | |
---|---|---|---|
ID | Item number | Integer (8 positions) | |
updatedDate | Date the records was changed in Sierra | Full date string | |
createdDate | Date the record was created in Sierra | Full date string | |
deleted | Flag if this record was deleted, will always be false. | Boolean | |
bibIds | The Bnumbers of the Bib record this item belongs to | Array of Integers | Indexed |
location | The location of the item. Refer to this document: Live: Google Doc Archived July 2015: Location Codes.csv | Object | |
code - the alphanumeric location code | String | ||
name - the display label | String | ||
status | The status of the item | Object | |
code - the alphanumeric status code | String | ||
display - the display label | String | ||
barcode | The barcode of the item | 14 Digit Integer (always 14?) | Indexed |
callNumber | Call number / Classmark of this item, field has piped delimiters | String | |
fixedFields | Various Sierra fields. | Object | |
57 - BIB HOLD | Object (label, value) | ||
58 - Copy No. | Object (label, value) | ||
59 - Item Code 1 | Object (label, value) | ||
60 - Item Code 2 | Object (label, value) | ||
61 - Item Type Lookup table: HERE | Object (label, value) | ||
62 - Price | Object (label, value) | ||
64 - Checkout Location | Object | ||
68 - Last Checkin | Object | ||
69 - Inventory Date | Object | ||
70 - Checkin Location | Object Integer | ||
74 - Item Use 3 | Object / Integer | ||
76 - Total Checkouts | Object / Integer | ||
77 - Total Renewals | Object / Integer | ||
78 - Last Checkout Date | Object | ||
79 - Location | Object / String | ||
80 - Record Type | Object / String | ||
81 - Record Number | Object / String | ||
83 - Created Date | Object | ||
84 - Updated Date | Object | ||
85 - No. of Revisions | Object / String | ||
86 - Agency | Object / String | ||
88 - Status | Object / String | ||
93 - Internal Use | Object / Integer | ||
94 - Copy Use | Object / Integer | ||
97 - Item Message | Object / String | ||
98 - PDATE | Object | ||
108 - OPAC Message | Object / String | ||
109 - Year-to-Date Circ | Object / Integer | ||
110 - Last Year Circ | Object / Integer | ||
127 - Item Agency | Object / String | ||
161 - VI Central | Object / Integer | ||
162 - IR Dist Learn Same Site | Object / String | ||
264 - Holdings Item Tag | Object / String | ||
265 - Inherit Location | Object / Boolean | ||
varFields | JSON representation of the MARC item data. Examples: Branch Book: item_branch.json Research Book: item_research.json Archives: item_archives.json |
API Documentation:
TODO
Server Config Documentation:
TODO
MongoDB Indexes:
db.item.createIndex({ "bibIds" : 1 }, {background: true} )
db.item.createIndex({ "barcode" : 1 }, {background: true} )
db.bib.createIndex({ "sc:oclc" : 1 }, {background: true} )
db.bib.createIndex({ "sc:isbn" : 1 }, {background: true} )
db.bib.createIndex({ "sc:issn" : 1 }, {background: true} )
db.bib.createIndex({ "sc:hathi" : 1 }, {background: true} )
db.bib.createIndex({ "classify:oclc" : 1 }, {background: true} )
db.bib.createIndex({ "classify:owi" : 1 }, {background: true} )
db.bib.createIndex({ "classify:creatorLC" : 1 }, {background: true} )
db.bib.createIndex({ "sc:research" : 1 }, {background: true} )
db.bib.createIndex({ "classify:creatorVIAF" : 1 }, {background: true} )
db.bib.createIndex({ "classify:dcc" : 1 }, {background: true} )
db.bib.createIndex({ "classify:lcc" : 1 }, {background: true} )
db.bib.createIndex({ "sc:lccCoarse" : 1 }, {background: true} )
db.bib.createIndex({ "sc:callnumber" : 1 }, {background: true} )