Currently Redis is used for caching in the Refinery.

Seeding and refreshing

When the Jenkins job to START or RESTART the Refinery queue processes is run, the Refinery cache is seeded and continually refreshed with the URLs specified in cache configuration YAML file from the web-refinery-cache repo:

https://bitbucket.org/NYPL/web-refinery-cache/raw/master/cache_config.yaml

The file contains a list of URLs to cache and the frequency (in minutes) that the URLs should be refreshed.

Forcing a cache refresh

A client may manually request a cache refresh of a URL at anytime by doing a "hard refresh" from their browser (typically by holding down the SHIFT key during a standard refresh).

This will queue a "priority refresh" of the current request URL by adding refresh messages to the queue with a higher priority than normal messages.

To prevent multiple refreshes (or possible DoS), refresh requests are tracked with a QueueRecord cache item.

Cache levels

See the CacheData directory for a listing of Cache objects.

To optimize page load speed and the cache hit rate there are multiple levels of caching:

Response (CachedResponse): The final response output presented to the client. Indexed by the request URL with query parameters.
NDO (CachedNDO): Separate NDO objects used to build a response. Indexed by the NDO type and ID.
Raw Data (RawData): The raw data read from the provider. Indexed by the URL requested from the data provider.

If the request requires data not available in the cache, a request will be initiated to the provider.

Caching

Seeding and refreshing

Forcing a cache refresh

Cache levels