Saturday, March 23, 2013

SPARQL queries for iRODS Data

This is cool:

PREFIX irods:       <> 
PREFIX skos:    <>
SELECT ?x ?y

?x  irods:correspondingConcept ?y .
?y skos:related <>

That's a SPARQL query running on Jena Fuseki...and this is related to the work we're doing with HIVE integration, as discussed in this previous blog entry...SPARQL is a query langage that can be used to search semantic metadata, in our case, metadata that describes the iRODS catalog, SKOS controlled vocabularies, and 'serialized' RDF statements saved as iRODS AVUs that apply controlled vocabulary terms to iRODS files and collections.  This improves the normal iRODS AVUs by giving them structure and meaning, via SKOS.

In the case above, we have a term defined in the Agrovoc vocabulary which looks something like this snippet, as 'turtle'.

      a       skos:Concept ;
      skos:narrower <> , <> , <> , <> , <> , <> , <> , <> , <> ;
      skos:prefLabel "Climatic zones"@en ;
      skos:related <> , <> , <> ;
      skos:scopeNote "Use for areas having identical climates; for the physical phenomenon use Climate (1665)"@en .

Note that SKOS will define broader, narrower, and related terms, along with other data.  This means that a user may tag an iRODS file or collection with a term like c_1669, and search for it on the related term c:6963.  

That's what the SPARQL query above shows, you are looking for any iRODS files or collections that have an AVU with a SKOS vocabulary term from Agrovoc that is related to a given concept.  The result of this query, in JSON, looks like so:

        "x": { "type": "uri" , "value": "irods://localhost:1247/test1/trash/home/test1/jargon-scratch.1256888938/JenaHiveIndexerServiceImplWithDerbyTest/testExecuteOnt/subdirectory2/hivefile7" } ,
        "y": { "type": "uri" , "value": "" }
      } ,
        "x": { "type": "uri" , "value": "irods://localhost:1247/test1/trash/home/test1/jargon-scratch.1256888938/JenaHiveIndexerServiceImplWithDerbyTest/testExecuteOnt/subdirectory1/hivefile7" } ,
        "y": { "type": "uri" , "value": "" }
      } ,
        "x": { "type": "uri" , "value": "irods://localhost:1247/test1/trash/home/test1/jargon-scratch.705362199/JenaHiveIndexerServiceImplWithOntTest/testExecuteOnt/subdirectory1/hivefile6" } ,
        "y": { "type": "uri" , "value": "" }
      } ,

As you can see (or at least trust me on this), you are finding iRODS data based on a related concept.  With Fuseki, we could add such SPARQL queries in short order to the iDrop apps, or even to iCommands.  Note that we've done this by marking up iRODS data with SKOS terms, storing these as special AVUs, indexing them with a spider, and then putting them into a Jena triple store for SPARQL queries.  The same sorts of things can also be pretty easily done using Lucene for text search, and adding these new methods of finding data is going to be an interesting area for Jargon and iRODS development.  You can see some of the HIVE work in the GForge project at DICE and RENCI here!

No comments:

Post a Comment