Static URL Store

This module contains the StaticURLStore store that communicates with a remote HTTP server which provides the actual data storage. This is a simple read-only store that can be run against a static HTTP server which provides a json file with all metadata and then serves data from URLs from another path. The metadata URL is polled periodically for updates.

A typical static server might be layed out as:

base_directory/
    index.json
    data/
        key1
        key2
        ...
class encore.storage.static_url_store.StaticURLStore(root_url, data_path, query_path, poll=300)

A read-only key-value store that is a front end for data served via URLs

All data is assumed to be served from some root url. In addition the store requires knowledge of two paths: a data prefix URL which is a partial URL to which the keys will be appended when requesting data, and a query URL which is a single URL which provides all metadata as a json encoded file.

For example, an HTTP server may store data at URLs of the form:

http://www.example.com/data/<key>

and may store the metadata at:

http://www.example.com/index.json

These would have a root url of “http://www.example.com/”, a data path of “data/” and a query path of “index.json”.

All queries are performed using urllib.urlopen, so this store can be implemented by an HTTP, FTP or file server which serves static files. When connecting, if appropriate credentials are supplied then HTTP authentication will be used when connecting the remote server

Warning

Since we use urllib without any further modifications, HTTPS requests do not validate the server’s certificate.

Because of the limited nature of the interface, this store implementation is read only, and handles updates via periodic polling of the query prefix URL. This guarantees that the viewed data is always consistent, it just may not be current. Most of the work of querying is done on the client side using the cached metadata.

Parameters:
  • event_manager – An event_manager which implements the BaseEventManager API.
  • root_url (str) – The base url that data is served from.
  • data_path (str) – The URL prefix that the data is served from.
  • query_path (str) – The URL that the metadata is served from.
  • poll (float) – The polling frequency for the polling thread. Polls every 5 min by default.
connect(credentials=None, proxy_handler=None, auth_handler_factory=None)

Connect to the key-value store, optionally with authentication

This method creates appropriate urllib openers for the store.

Parameters:
  • credentials (dict) – A dictionary which has at least keys ‘username’ and ‘password’ and optional keys ‘uri’ and ‘realm’. The ‘uri’ will default to the root url of the store, and ‘realm’ will default to ‘encore.storage’.
  • proxy_handler (urllib.ProxyHandler) – An optional urllib.ProxyHandler instance. If none is provided then urllib will create a proxy handler from the user’s environment if needed.
  • auth_handler_factory – An optional factory to build urllib authenticators. The credentials will be passed as keyword arguments to this handler’s add_password method.
disconnect()

Disconnect from the key-value store

This method disposes or disconnects to any long-lived resources that the store requires.

exists(key)

Test whether or not a key exists in the key-value store

Parameters:key (string) – The key for the resource in the key-value store. They key is a unique identifier for the resource within the key-value store.
Returns:exists (bool) - Whether or not the key exists in the key-value store.
get(key)

Retrieve a stream of data and metdata from a given key in the key-value store.

Parameters:key (string) – The key for the resource in the key-value store. They key is a unique identifier for the resource within the key-value store.
Returns:
  • data (file-like) - A readable file-like object that provides stream of data from the key-value store. This is the same type of filelike object returned by urllib’s urlopen function.
  • metadata (dictionary) - A dictionary of metadata for the key.
Raises:KeyError - If the key is not found in the store, a KeyError is raised.
get_data(key)

Retrieve a stream from a given key in the key-value store.

Parameters:key (string) – The key for the resource in the key-value store. They key is a unique identifier for the resource within the key-value store.
Returns:data (file-like) - A readable file-like object the that provides stream of data from the key-value store. This is the same type of filelike object returned by urllib’s urlopen function.
Raises:KeyError - This will raise a key error if the key is not present in the store.
get_metadata(key, select=None)

Retrieve the metadata for a given key in the key-value store.

Parameters:
  • key (string) – The key for the resource in the key-value store. They key is a unique identifier for the resource within the key-value store.
  • select (iterable of strings or None) – Which metadata keys to populate in the result. If unspecified, then return the entire metadata dictionary.
Returns:

metadata (dict) - A dictionary of metadata associated with the key. The dictionary has keys as specified by the select argument. If a key specified in select is not present in the metadata, then it will not be present in the returned value.

Raises:

KeyError - This will raise a key error if the key is not present in the store.

glob(pattern)

Return keys which match glob-style patterns

Parameters:pattern (string) – Glob-style pattern to match keys with.
Returns:result (iterable) - A iterable of keys which match the glob pattern.
is_connected()

Whether or not the store is currently connected

Returns:connected (bool) - Whether or not the store is currently connected.
multiget(keys)

Retrieve the data and metadata for a collection of keys.

Parameters:keys (iterable of strings) – The keys for the resources in the key-value store. Each key is a unique identifier for a resource within the key-value store.
Returns:result (iterator of (file-like, dict) tuples) - An iterator of (data, metadata) pairs.
Raises:KeyError - This will raise a key error if the key is not present in the store.
multiget_data(keys)

Retrieve the data for a collection of keys.

Parameters:keys (iterable of strings) – The keys for the resources in the key-value store. Each key is a unique identifier for a resource within the key-value store.
Returns:result (iterator of file-like) - An iterator of file-like data objects corresponding to the keys.
Raises:KeyError - This will raise a key error if the key is not present in the store.
multiget_metadata(keys, select=None)

Retrieve the metadata for a collection of keys in the key-value store.

Parameters:
  • keys (iterable of strings) – The keys for the resources in the key-value store. Each key is a unique identifier for a resource within the key-value store.
  • select (iterable of strings or None) – Which metadata keys to populate in the results. If unspecified, then return the entire metadata dictionary.
Returns:

metadatas (iterator of dicts) - An iterator of dictionaries of metadata associated with the key. The dictionaries have keys as specified by the select argument. If a key specified in select is not present in the metadata, then it will not be present in the returned value.

Raises:

KeyError - This will raise a key error if the key is not present in the store.

query(select=None, **kwargs)

Query for keys and metadata matching metadata provided as keyword arguments

This provides a very simple querying interface that returns precise matches with the metadata. If no arguments are supplied, the query will return the complete set of metadata for the key-value store.

Parameters:
  • select (iterable of strings or None) – An optional list of metadata keys to return. If this is not None, then the metadata dictionaries will only have values for the specified keys populated.
  • kwargs – Arguments where the keywords are metadata keys, and values are possible values for that metadata item.
Returns:

result (iterable) - An iterable of (key, metadata) tuples where metadata matches all the specified values for the specified metadata keywords. If a key specified in select is not present in the metadata of a particular key, then it will not be present in the returned value.

query_keys(**kwargs)

Query for keys matching metadata provided as keyword arguments

This provides a very simple querying interface that returns precise matches with the metadata. If no arguments are supplied, the query will return the complete set of keys for the key-value store.

This is equivalent to self.query(**kwargs).keys(), but potentially more efficiently implemented.

Parameters:kwargs – Arguments where the keywords are metadata keys, and values are possible values for that metadata item.
Returns:result (iterable) - An iterable of key-value store keys whose metadata matches all the specified values for the specified metadata keywords.
to_bytes(key, buffer_size=1048576)

Efficiently store the data associated with a key into a bytes object.

This method can be optionally overriden by subclasses to proved a more efficient way of copy the data from the underlying data store to a bytes object. The default implementation uses the get() method together with chunked reads from the returned data stream and join.

Parameters:
  • key (string) – The key for the resource in the key-value store. They key is a unique identifier for the resource within the key-value store.
  • buffer_size (int) – An optional indicator of the number of bytes to read at a time. Implementations are free to ignore this hint or use a different default if they need to. The default is 1048576 bytes (1 MiB).
Returns:

bytes - The contents of the file-like object as bytes.

Events:
  • StoreProgressStartEvent - For buffering implementations, this event should be emitted prior to extracting the data.
  • StoreProgressStepEvent - For buffering implementations, this event should be emitted periodically as data is extracted.
  • StoreProgressEndEvent - For buffering implementations, this event should be emitted after extracting the data.
to_file(key, path, buffer_size=1048576)

Efficiently store the data associated with a key into a file.

This method can be optionally overriden by subclasses to proved a more efficient way of copy the data from the underlying data store to a path in the filesystem. The default implementation uses the get() method together with chunked reads from the returned data stream to the disk.

Parameters:
  • key (string) – The key for the resource in the key-value store. They key is a unique identifier for the resource within the key-value store.
  • path (string) – A file system path to store the data to.
  • buffer_size (int) – An optional indicator of the number of bytes to read at a time. Implementations are free to ignore this hint or use a different default if they need to. The default is 1048576 bytes (1 MiB).
Events:
  • StoreProgressStartEvent - For buffering implementations, this event should be emitted prior to writing any data to disk.
  • StoreProgressStepEvent - For buffering implementations, this event should be emitted periodically as data is written to disk.
  • StoreProgressEndEvent - For buffering implementations, this event should be emitted after finishing writing to disk.
update_index()

Request the most recent version of the metadata

This downloads the json file at the query_path location, and updates the local metadata cache with this information. It then emits events that represent the difference between the old metadata and the new metadata.

This method is normally called from the polling thread, but can be called by other code when needed. It locks the metadata index whilst performing the update.