When designing data store, it might be a good idea to separate storage and query (and maybe index) layers.
Besides creating a clear separation of concerns, this allows running queries on nodes that do not have whole data (only index is needed). This also allows offloading data to cheaper storage like s3.
In Datomic (Hickey2012):
- Transaction service is responsible for receiving and processing incoming changes
- Storage service is responsible for storing segments (blobs) and can be any storage
- A separate indexer service indexes all incoming data and creates indices (as index segments in storage)
- Separate query component can use index segments to lookup data and fetch only required data from storage
- Index segments can be cached using memcached, etc.
- documents and all data/blobs are stores in a storage (any blob storage would do. e.g., s3)
indices are kept separately in any key-value store (in-memory, sqlite, SQL)
- indices can be completely destroyed and re-created from storage
- search layer uses indices to look up data