Tuesday, March 12, 2013

Some work in progress integrating HIVE with iRODS

iRODS has a powerful facility, through the iCAT master catalog, to manage user-supplied metadata on different parts of the catalog domain, such as files and collections.  These are 'AVU' triples, which are just attribute-value-unit slots that can hold free-format data.

We're using AVUs by adding conventions and metaphors on top of them, such as free tags, starred folders, and shares, such as in this previous video demo.  One weakness of AVUs is that they are totally unstructured.  This does not mean that we cannot apply structure at a higher level, and that's exactly what the interest in HIVE integration is about.

HIVE is an acronym for Helping Interdisciplinary Vocabulary Engineering, and HIVE is a project from the Metadata Research Center at the School of Information and Library Science at UNC Chapel Hill,.  (Did I mention we were just voted the #2 best program in the country by US News and World Report?)

HIVE is a tool that allows browsing and searching across controlled vocabularies defined in SKOS, a simple RDF schema for defining dictionaries, thesauri, and other structured metadata.  A key aspect is the integration of RDF with Lucene to allow searching across selected vocabularies, a helpful approach since much of the focus of iRODS and DICE is in multi-disciplinary research collaboration, as in the Datanet Federation Consortium.  HIVE solves a lot of problems we were facing, so it is a happy circumstance that the MRC is just around the corner from us, and we're busy looking at integration.

In a nutshell, HIVE allows us to:

  • Keep multiple controlled vocabularies
  • Allow users to easily search and navigate across vocabularies to find appropriate terms
  • Make AVU metadata meaningful by providing structure and consistency
  • Power rich metadata queries using tools such as SPARQL to find iRODS files and collections

A short video demo follows that shows the first level of integration between iDrop (the iRODS cloud browser) and HIVE.  We've added a HIVE tab to contain a concept browser, allowing markup of iRODS files and collections with controlled vocabulary terms.

Note that we've yet to add search across vocabularies and automatic keyword extraction with MAUI and KEA.  These are available in HIVE, and we intend on adding them in this project.  

The next step is to build the capability to extract iRODS data and vocabulary terms and populate a triple store (Sesame or Jena), allowing queries on the triple-store, and allowing processing of results such that users can access the referenced data in iRODS.  We're seeking a generalized approach so that we can have a standard practice to store RDF statements about iRODS data, and we can index and manage real-time updates.  This aspect is next for the project, and should have a wide application for iRODS users!

No comments:

Post a Comment