The idea is to associate tags with files and then work with sets of files that
are found based on queries on the tags.

We also intend to tag tags.

The initial interface will be via command line tools. Since the implementation
is in Java, there will be a generic "C" command line prog that will delegate
its request to a Java server.

The tools will allow for the check out of a set. Then normal progs can operate
on the files, and finally check the files back in. However, in the long run it
would be good to build a FUSE based interface to the files, thus removing the
need for explicity checking the files out.

Note, tags are simply a way of building associations. The value is in the
associations and the tag is simply a name for the association. Eventually we
hope so support association between files and full queries, or maintain a
weighted association between a tag and its relatives based on access patterns.

An association can also be anonymous:

	e.g.
		File x -> { "is a" "picture" }
		
		is similary to
		
		File x -> "is a picture" ---> "is a"
		                          +-> "picture"
		                          
Once we have these associations we can start constructing ways of traversing 
them so as to retrieve information.

Design Notes
============

Concepts
--------

The tagger is built around the following concepts:
	artifact - something that is tangible and is and end point e.g. tag, text, file etc.
	association - link between two artifacts
	
The key idea is that any artifact can be associated with any other artifact and that these
associations can then be queried so as to find artifacts.

Building Blocks
---------------

The datastore will be used to store binary chunks (accessible as streams)

One datastore might hold file contents, while another holds file metadata

artifact ids are long integers (8 bytes) and will contain an artifact type encoded in the least significant 2 bytes
therefore there can be 2^48 or 281474976710656 artifacts with 65536 different types of artifacts. For interoperability
it might be better to simply go the whole hog and use a UUID with the version a proprietary value so that the UUID
can have the artifact type encoded into it.

Object streams allow simple objects (only public fields with primatives) to be serialised

Artifact DB
===========

The basic concept is to store artifacts (chunks of data/documents/messages) and then be able to retieve them again.

	concepts:
		artifact		- a chunk of data of a particular type that is to be stored
		datastore		- the actual storage of the data (on a filesystem or remotely)
		metadata		- canonical representation of some data extracted from the artifact
		indexes			- search indicies generated as required on metadata fields
		associations	- relationships between artifacts (link based on artifact UUIDs)
		contracts		- any input or output must be validated against the contract for that type
		action			- task that is performed as a result of a trigger or event
		
	logic:
		store:
			1. validate document
			2. extract metadata
			3. store doc & metadata
			4. update any indexes that the artifact type subscribes to
			5. trigger rules to see if any actions should be performed
			6. return the UUID for the new document
		retrieve:
			1. verify existence of UUIDs in request
			2. verify output contract for each artifact
			3. return set of artifacts
		produce:
			- use metadata for artifacts to construct a new artifact
				(probably a subset or aggregation of metadata)
		search:
			- search across metadata indexes for matching artifacts
			- search across associations for matching artifacts
			
	Indexes are created automatically based on search patterns.
	Associations are created explicitly (either via external control or via actions).
	For resource usage and efficiency associations are automatically clustered based on 
	the type of artifacts linked together and the access patterns of searches.