Registry Ingest

Data is serialized from TMS, MMS and Archives system extracts and stored in a Mongo instance to facilitate record matching and subject/agent extraction in preparation for RDF modeling.

Archives

The collection responses and component responses are used to populate the archives data. These are JSON representation of the EAD finding aids.

Collection Object example: archivesCollection.json

Component Object example: archivesComponent.json and archivesComponentSeries.json (series are not always labeled as such, but the hierarchy via parentDb is consistent) 


MMS

JSON representation of the MODS XML, the identifiers are from the desc_xml blob meaning they do not contain inherited identifiers.

This currently has data that is not inherited, this can change depending on what fields we want to inherit.

Collection data example: mmsCollectionObject.json

Container data example: mmsContainerObject.json

Item data example: mssItemObject.json

Capture data example: mmsCaptureObject.json (capture "name" is fixed in the next export of data, current "name" will be move to "itemName")

 

TMS

The TMS data comes from a number of tables combined into a single object with agents, alt title, division, material type etc.

Data sample: tmsObject.json

 

Catalog

Refer to the shadowcat docs to see the available fields.

 


Property mappings

MMS, Archives, TMS, and Catalog property mappings are at this spreadsheet.

 


Resource Reconciliation Strategies

 

MMS <-> Archives

Strategy: Collections - Common Identifiers.

Loop through all MMS collections:
Match on mss id and bNumber present in both archives and MMS
If there is a match use that

If there is no match then match on Call Number present in both,
If there is more than one match check the over lap of collection title name, if overlap of title is > 75% use that one.

Prerequisites:
None

Warnings/Checks:
None

Functions:
mapCollectionsToArchivesCollections


——————

Strategy: Collections - Titles.

Find all Collections in MMS that could be in archives (by division code) that are not yet matched and do not have a bNumber (idea being that if it has a bNumber then it would be matched already since almost all collections in archives have a bNumber.

Loop through all the Archives collection titles and see if there is an overlap of title words > 75%

If there is, connect those two collections

Prerequisites:
"Collections - Common Identifiers.” Strategy has been run.

Warnings/Checks:
Also use a similarity cosine check (0.5 fuzziness) on the two titles, if it < .2 flag it as potentially wrong.

Functions:
mapCollectionsToArchivesCollectionsTitleMatch


——————

Strategy: Item Collections - Identifiers.

Find all Items in MMS that do not have a collection
Match bNumber and mss id to archive collections

Prerequisites:
None

Warnings/Checks:
None

Functions:
mapItemCollectionsToArchivesCollectionsIdentifiers

——————

Strategy: Items & Containers - Identifiers.

Match MMS Items and MMS Containers to archive components where they have the same mss or bNumber

Prerequisites:
None

Warnings/Checks:
None

Functions:
mapContainersToArchivesComponentsByIdentifiers
mapItemsToArchivesComponentsByIdentifiers

——————

Strategy: Items by Archives Repo Id.

Match MMS items to archives components by using the UUIDs stored in the archives database.
If the Archives stored UUID points to a container map the component to the container and the container to the component but also all the items that live in the container to the component as well.

Prerequisites:
None

Warnings/Checks:
None

Functions:
mapItemsToComponentsByRepoObjectTable

 


MMS Ingest Logic