Registry Ingest
Data is serialized from TMS, MMS and Archives system extracts and stored in a Mongo instance to facilitate record matching and subject/agent extraction in preparation for RDF modeling.
Archives
The collection responses and component responses are used to populate the archives data. These are JSON representation of the EAD finding aids.
Collection Object example: archivesCollection.json
Component Object example: archivesComponent.json and archivesComponentSeries.json (series are not always labeled as such, but the hierarchy via parentDb is consistent)
MMS
JSON representation of the MODS XML, the identifiers are from the desc_xml blob meaning they do not contain inherited identifiers.
This currently has data that is not inherited, this can change depending on what fields we want to inherit.
Collection data example: mmsCollectionObject.json
Container data example: mmsContainerObject.json
Item data example: mssItemObject.json
Capture data example: mmsCaptureObject.json (capture "name" is fixed in the next export of data, current "name" will be move to "itemName")
TMS
The TMS data comes from a number of tables combined into a single object with agents, alt title, division, material type etc.
Data sample: tmsObject.json
Catalog
Refer to the shadowcat docs to see the available fields.
Property mappings
MMS, Archives, TMS, and Catalog property mappings are at this spreadsheet.
Resource Reconciliation Strategies
MMS <-> Archives
Strategy: Collections - Common Identifiers.
Loop through all MMS collections:
Match on mss id and bNumber present in both archives and MMS
If there is a match use that
If there is no match then match on Call Number present in both,
If there is more than one match check the over lap of collection title name, if overlap of title is > 75% use that one.
Prerequisites:
None
Warnings/Checks:
None
Functions:
mapCollectionsToArchivesCollections
——————
Strategy: Collections - Titles.
Find all Collections in MMS that could be in archives (by division code) that are not yet matched and do not have a bNumber (idea being that if it has a bNumber then it would be matched already since almost all collections in archives have a bNumber.
Loop through all the Archives collection titles and see if there is an overlap of title words > 75%
If there is, connect those two collections
Prerequisites:
"Collections - Common Identifiers.” Strategy has been run.
Warnings/Checks:
Also use a similarity cosine check (0.5 fuzziness) on the two titles, if it < .2 flag it as potentially wrong.
Functions:
mapCollectionsToArchivesCollectionsTitleMatch
——————
Strategy: Item Collections - Identifiers.
Find all Items in MMS that do not have a collection
Match bNumber and mss id to archive collections
Prerequisites:
None
Warnings/Checks:
None
Functions:
mapItemCollectionsToArchivesCollectionsIdentifiers
——————
Strategy: Items & Containers - Identifiers.
Match MMS Items and MMS Containers to archive components where they have the same mss or bNumber
Prerequisites:
None
Warnings/Checks:
None
Functions:
mapContainersToArchivesComponentsByIdentifiers
mapItemsToArchivesComponentsByIdentifiers
——————
Strategy: Items by Archives Repo Id.
Match MMS items to archives components by using the UUIDs stored in the archives database.
If the Archives stored UUID points to a container map the component to the container and the container to the component but also all the items that live in the container to the component as well.
Prerequisites:
None
Warnings/Checks:
None
Functions:
mapItemsToComponentsByRepoObjectTable
MMS Ingest Logic