...
- Develop jobs to keep the data store updated via the Sierra API (https://github.com/thisismattmiller/shadowcat-data)
- Jobs for Bib and Item records (Complete)
- Job to look for missing Item records and update ad hoc (TODO)
- Job to flag a resource as a "research" holding based on item location and other logic (Complete)
- Linked Data Jobs (https://github.com/thisismattmiller/shadowcat-lod)
- Bulk ingest job to associate the 34GB Classify data I scraped in June and associate them to their correct Bib records in shadowcat (Complete)
- Update job that will spawn workers to keep new records and check old records for their classify results (Mostly written, refactorComplete)
- Normalize Classify LCC numbers (Complete)
- Update job that ingests OCLC linked data services for anything not in Classify (TODOComplete)
- API
- Build API access endpoints for the following (TODOIn Progress)
- bNumber (Complete) (http://45.55.205.23/api/bnumber/b20480002)
- OCLC
- ISBN
- ISSN
- Hathi
- VIAF
- Item Barcode
- Build API access endpoints for the following (TODOIn Progress)
- Dashboard (nice to have) (TODO)
...
Field | Description / Subfield | Data Type | |
---|---|---|---|
sc:oclc | OCLC Worldcat IDs extracted from various MARC locations | Array of Integers | Indexed |
sc:research | A flag to mark if a bib record is considered a research resource | Boolean | Indexed, Process outlined below: |
sc:classifyCheck | Date when the record was lasted checked in OCLC Classify | String YYYY-MM-DD | |
sc:worldcatCheck | Date when the record was lasted checked in OCLC Worldcat | String YYYY-MM-DD | |
sc:isbn | ISBN Ids extracted from the MARC | Array of Strings | Indexed |
sc:issn | ISSN Ids extracted from the MARC | Array of Strings | Indexed |
sc:hathi | The hathi volume ids linked to this resource | Array of strings | Indexed |
sc:lccCoarse | The results of a process to convert a | String or False | Indexed. Values: lcc_outlines.json |
sc:usageCount | Tallied number of Checkouts and Renewals from the Items table | Integer | -1 mean there was no item record, so unknown usage count. Current data from Summer 2015 |
sc:publishYear | If the publishYear on the bib record is not a 4 digit year it will try to populate this field with the likely publish date following the logic to the right. | Integer | Pull MARC Fields: Look in each field for 19/20/21 century years that match: Take the lowest year and use that as the sc:publishYear If no years were found look into the items |
sc:classmark | The base (minus cutter, etc) of the any of the classmarks found in the 852 field. | Array of strings | Indexed |
sc:callnumber | The full call number including item records. | Array of strings | Indexed |
sc:local | On some records that are local resources, these are bib records flagged previously not for oclc reclamation project or ones that have a clippings call number, such as *T-CLP | Boolean | |
sc:agents | An array of objects that combines any classify and worldcat and MARC name fields. | Array of Obj | |
sc:terms | An array of objects that represents the combined FAST and local subject terms mapped together | Array of Obj |
Library of Congress BIB Fields Record Fields:
...