Thursday, September 12, 2013

Basic intro - getting and building Jargon

The latest in our little series of video chats is an overview of logging into GForge at RENCI, and getting Jargon from git, building it with Maven.

Monday, June 17, 2013

Jargon and RestEasy - some notes on what I've run into

I'm starting to work on a formal REST API for iRODS.  This is coming from multiple projects, but this first one gives me a chance to build the skeleton and set down some practices for later.  The project itself is here on the RENCI GForge.

For several reasons, I decided to roll with JBoss RestEasy.  Not least of which is their compliance with JAX-RS, which goes some way towards future-proofing any work I do.  There is also a need to do some S/MIME encryption of messages, and it looks like RestEasy handles this well enough.

RestEasy is not without its headaches and frustrations.  A good deal of this frustration has to do with integrating Spring beans into the mix, which I use a lot in Jargon.  The docs don't seem to reflect actual usage in this area, both for service development, and for testing.  In the Spring Integration section of the RestEasy docs, you get this example:

   <display-name>Archetype Created Web Application</display-name>




For your web.xml, and:

<beans xmlns=""

    <!-- Import basic SpringMVC Resteasy integration -->
    <import resource="classpath:springmvc-resteasy.xml"/>

For the Spring configuration file.

It doesn't work, it doesn't load your beans...

I dug around a lot (and sorry, I cannot retrace my steps and refer you to some of the info I found!), and by combining several proposed solutions I found that this worked...

First, for the web.xml document:








The things to highlight here include the fact that I had to comment out the RestEasy component scan context parameter, add the contextConfigLocation parameter, and wire in the Spring RestEasy integration components by hand.  In this configuration, it does load my custom beans, and then it loads my RestEasy services by the fact that I added direct Spring configuration for that component scan:


OK, so that seems to be OK now. I have a service running, it's doing content negotiation, it's wired in my Jargon, how do I test it? Jargon has a lot of tests, I don't need to test Jargon or iRODS, so I decided to test at the http request level. Given this, I was not too excited at testing with mocks. Mocks seem like a lot of trouble, and might mask some of the subtleties involved, given that it's pretty easy (I thought) to test all of this with an embedded servlet container. This seems ideal...test end-to-end as the user sees things, create sample code at the same time. What could be better!

The JBoss docs don't go into great detail about best practices for testing RestEasy apps, but clearly the TJWS embedded container seemed obvious.  They provide a bit of pseudo-code in the docs, and, unfortunately, it does not work:

   public static void main(String[] args) throws Exception 
      final TJWSEmbeddedJaxrsServer tjws = new TJWSEmbeddedJaxrsServer();

      org.jboss.resteasy.plugins.server.servlet.SpringBeanProcessor processor = new SpringBeanProcessor(tjws.getDeployment().getRegistry(), tjws.getDeployment().getFactory();
      ConfigurableBeanFactory factory = new XmlBeanFactory(...);

At least we see a bit that looks like we can adapt to the setup() method of a JUnit test case.  Given that clue, I found some very helpful posts, such as this one from 'eugene' (thanks eugene!).  But even this did not work, as ApplicationContext was not @Autowired.  I kept getting NPEs.

This got me very close, and I've used SpringJUnit4ClassRunner extensively for Hibernate/JPA based applications in the iDrop suite, so I felt like I just needed to hack on that a bit and I could get there.  The missing piece came from 'Daff' (thanks Daff!) who pointed out the ability to have your JUnit test case extend ApplicationContextAware in his post.

I tried to wire this into the @BeforeClass annotated startup() method with a static ApplicationContext variable.  Needless to say, that did not work, and was always 'null'.  It ended up that I had to place that server startup code in the @Before annotated method, which runs on instance variables, and the context was then available.  That's a little bit hinky, but given that I have a very short window for this project, I rolled with a solution there that saves the ApplicationContext in a static variable, and checks to see (singleton-like) if an instance has been created yet for the JUnit class.  This is working fine so far, and only smells a tiny bit.  I may revisit it, but I'm happy enough to get a base testing strategy defined.

So, here's a JUnit test that works with Spring configured beans:


import junit.framework.Assert;

import org.jboss.resteasy.client.ClientRequest;
import org.jboss.resteasy.client.ClientResponse;
import org.jboss.resteasy.core.Dispatcher;
import org.jboss.resteasy.plugins.server.tjws.TJWSEmbeddedJaxrsServer;
import org.jboss.resteasy.plugins.spring.SpringBeanProcessor;
import org.jboss.resteasy.plugins.spring.SpringResourceFactory;
import org.jboss.resteasy.spi.ResteasyDeployment;
import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.BeansException;
import org.springframework.context.ApplicationContext;
import org.springframework.context.ApplicationContextAware;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.TestExecutionListeners;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

@ContextConfiguration(locations = { "classpath:jargon-beans.xml",
  "classpath:rest-servlet.xml" })
@TestExecutionListeners({ DependencyInjectionTestExecutionListener.class,
  DirtiesContextTestExecutionListener.class })
public class UserServiceTest implements ApplicationContextAware {

 private static TJWSEmbeddedJaxrsServer server;

 private static ApplicationContext applicationContext;

 public static void setUpBeforeClass() throws Exception {


 public static void tearDownAfterClass() throws Exception {
  if (server != null) {

 public void setUp() throws Exception {
  if (server != null) {

  server = new TJWSEmbeddedJaxrsServer();
  ResteasyDeployment deployment = server.getDeployment();
  Dispatcher dispatcher = deployment.getDispatcher();
  SpringBeanProcessor processor = new SpringBeanProcessor(dispatcher,
    deployment.getRegistry(), deployment.getProviderFactory());
  ((ConfigurableApplicationContext) applicationContext)

  SpringResourceFactory noDefaults = new SpringResourceFactory(
    "userService", applicationContext, UserService.class);


 public void tearDown() throws Exception {

 public void testGetUserJSON() throws Exception {

  final ClientRequest clientCreateRequest = new ClientRequest(

  final ClientResponse clientCreateResponse = clientCreateRequest
  Assert.assertEquals(200, clientCreateResponse.getStatus());
  String entity = clientCreateResponse.getEntity();


 public void setApplicationContext(final ApplicationContext context)
   throws BeansException {
  applicationContext = context;


So it might not be 'ideal', but I can move on and get this thing done.  I'd appreciate any pointers or refinements, and hopefully this will at least get you running and save you similar headaches.

Thursday, April 25, 2013

Demo of SPARQL search in HIVE/iRODS Integration

This demo video was prepared for a presentation this week, and it shows a bit more of the integration between iRODS and

As mentioned previously, we're integrating controlled vocabularies via SKOS using the HIVE system. Dr. Jane Greenberg as SILS has prepared a short paper describing some of the concepts and motivations for this effort here.

Technically, we have three primary elements:

  1. Integration of the HIVE system into our iDrop web interface.  This includes a new set of Jargon libraries that support this integration, allowing easy wiring of HIVE functionality via Spring.
  2. A 'visitor' and 'iterator' library in Jargon for sweeping through data objects and collections marked up with HIVE RDF terms.
  3. An OWL vocabulary (though it's a rough sketch right now) describing iRODS ICAT metadata and relationships, which also goes into the index with our vocabularies.
  4. A HIVE query REST interface that can issue SPARQL queries to our indexed triple store, and a start at some preset queries such as searching on a term, or searching for items related to a term.
These elements are demonstrated in the demo video below...

Saturday, March 23, 2013

SPARQL queries for iRODS Data

This is cool:

PREFIX irods:       <> 
PREFIX skos:    <>
SELECT ?x ?y

?x  irods:correspondingConcept ?y .
?y skos:related <>

That's a SPARQL query running on Jena Fuseki...and this is related to the work we're doing with HIVE integration, as discussed in this previous blog entry...SPARQL is a query langage that can be used to search semantic metadata, in our case, metadata that describes the iRODS catalog, SKOS controlled vocabularies, and 'serialized' RDF statements saved as iRODS AVUs that apply controlled vocabulary terms to iRODS files and collections.  This improves the normal iRODS AVUs by giving them structure and meaning, via SKOS.

In the case above, we have a term defined in the Agrovoc vocabulary which looks something like this snippet, as 'turtle'.

      a       skos:Concept ;
      skos:narrower <> , <> , <> , <> , <> , <> , <> , <> , <> ;
      skos:prefLabel "Climatic zones"@en ;
      skos:related <> , <> , <> ;
      skos:scopeNote "Use for areas having identical climates; for the physical phenomenon use Climate (1665)"@en .

Note that SKOS will define broader, narrower, and related terms, along with other data.  This means that a user may tag an iRODS file or collection with a term like c_1669, and search for it on the related term c:6963.  

That's what the SPARQL query above shows, you are looking for any iRODS files or collections that have an AVU with a SKOS vocabulary term from Agrovoc that is related to a given concept.  The result of this query, in JSON, looks like so:

        "x": { "type": "uri" , "value": "irods://localhost:1247/test1/trash/home/test1/jargon-scratch.1256888938/JenaHiveIndexerServiceImplWithDerbyTest/testExecuteOnt/subdirectory2/hivefile7" } ,
        "y": { "type": "uri" , "value": "" }
      } ,
        "x": { "type": "uri" , "value": "irods://localhost:1247/test1/trash/home/test1/jargon-scratch.1256888938/JenaHiveIndexerServiceImplWithDerbyTest/testExecuteOnt/subdirectory1/hivefile7" } ,
        "y": { "type": "uri" , "value": "" }
      } ,
        "x": { "type": "uri" , "value": "irods://localhost:1247/test1/trash/home/test1/jargon-scratch.705362199/JenaHiveIndexerServiceImplWithOntTest/testExecuteOnt/subdirectory1/hivefile6" } ,
        "y": { "type": "uri" , "value": "" }
      } ,

As you can see (or at least trust me on this), you are finding iRODS data based on a related concept.  With Fuseki, we could add such SPARQL queries in short order to the iDrop apps, or even to iCommands.  Note that we've done this by marking up iRODS data with SKOS terms, storing these as special AVUs, indexing them with a spider, and then putting them into a Jena triple store for SPARQL queries.  The same sorts of things can also be pretty easily done using Lucene for text search, and adding these new methods of finding data is going to be an interesting area for Jargon and iRODS development.  You can see some of the HIVE work in the GForge project at DICE and RENCI here!

Tuesday, March 12, 2013

Some work in progress integrating HIVE with iRODS

iRODS has a powerful facility, through the iCAT master catalog, to manage user-supplied metadata on different parts of the catalog domain, such as files and collections.  These are 'AVU' triples, which are just attribute-value-unit slots that can hold free-format data.

We're using AVUs by adding conventions and metaphors on top of them, such as free tags, starred folders, and shares, such as in this previous video demo.  One weakness of AVUs is that they are totally unstructured.  This does not mean that we cannot apply structure at a higher level, and that's exactly what the interest in HIVE integration is about.

HIVE is an acronym for Helping Interdisciplinary Vocabulary Engineering, and HIVE is a project from the Metadata Research Center at the School of Information and Library Science at UNC Chapel Hill,.  (Did I mention we were just voted the #2 best program in the country by US News and World Report?)

HIVE is a tool that allows browsing and searching across controlled vocabularies defined in SKOS, a simple RDF schema for defining dictionaries, thesauri, and other structured metadata.  A key aspect is the integration of RDF with Lucene to allow searching across selected vocabularies, a helpful approach since much of the focus of iRODS and DICE is in multi-disciplinary research collaboration, as in the Datanet Federation Consortium.  HIVE solves a lot of problems we were facing, so it is a happy circumstance that the MRC is just around the corner from us, and we're busy looking at integration.

In a nutshell, HIVE allows us to:

  • Keep multiple controlled vocabularies
  • Allow users to easily search and navigate across vocabularies to find appropriate terms
  • Make AVU metadata meaningful by providing structure and consistency
  • Power rich metadata queries using tools such as SPARQL to find iRODS files and collections

A short video demo follows that shows the first level of integration between iDrop (the iRODS cloud browser) and HIVE.  We've added a HIVE tab to contain a concept browser, allowing markup of iRODS files and collections with controlled vocabulary terms.

Note that we've yet to add search across vocabularies and automatic keyword extraction with MAUI and KEA.  These are available in HIVE, and we intend on adding them in this project.  

The next step is to build the capability to extract iRODS data and vocabulary terms and populate a triple store (Sesame or Jena), allowing queries on the triple-store, and allowing processing of results such that users can access the referenced data in iRODS.  We're seeking a generalized approach so that we can have a standard practice to store RDF statements about iRODS data, and we can index and manage real-time updates.  This aspect is next for the project, and should have a wide application for iRODS users!

Thursday, February 7, 2013

Packing Instructions

Folks often ask what Jargon actually does, and I usually say it's like a JDBC driver underneath a high-level object library. The JDBC driver part refers to the fact that iRODS has a wire-level protocol that communicates commands and data between client and server (and this same protocol works server-to-server, it's a grid!). Anyhow, inside Jargon, there is an org.irods.jargon.core.packinstr package that models iRODS packing instructions.

This low-level protocol handling is meant to be 'under the covers' so you never need to worry about it.  This is especially important because the protocols can change, and we might see future upgrade to use something like protobuf added.

At any rate, when developing Jargon implementations of the iRODS protocol, the actual procedure is to mine the C code, and hack at it until it works, via the creation of lots of unit tests.  Fancy...

In this endeavor, it's often helpful to see the actual protocol interactions of various icommands, and here's how you can do this too...

Simple open a shell, and export these variables:

export irodsProt=1; export irodsLogLevel=9;

Now as you execute your icommands, you'll be able to peek at the protocol operations going back and forth.  Be glad you don't really have to look at that, maybe you'll like Jargon now!

Thursday, January 31, 2013

A video demo of shares in iDrop web2

In a previous post, we talked about the addition of sharing to the Jargon libraries, and how that worked in code.

Here's a demo of sharing at work in the iDrop Web2 interface, and it shows how the AVU metadata and iRODS access control list (ACL) data works together to make shares.  It shows how a 'share' becomes a first-class object on the home page, and sort of like a 'drive' that can be mounted in a browse view.

Friday, January 25, 2013


As we try and present this underlying complexity in interfaces and high-level API, certain themes  keeps repeating.  iRODS provides a fairly comprehensive set of tools to manage complex collections, including metadata management, automatic system metadata maintenance,  a global logical namespace, policy enforcement, ACL management, audit, and the like.

iRODS has a full ACL management system.  This includes the ability to federate iRODS zones and allow cross-zone access right management.  It's very powerful when combined with the audit log capability of the iRODS server.  iRODS has the mechanics can we take these elements and combine them in ways that create new services?  That's part of the fun of working on Jargon and iRODS user interfaces.  One thing that intrigues me is to look at iRODS as an application development platform from which new kinds of policy aware, distributed data applications are formed.  Sometimes the point is to look at merging iRODS middleware with other services (like Sesame and Lucene, as we're working on with the HIVE integration project, something we'll look at later).

A basic goal of Jargon and the iDrop projects is to create API and interfaces that can present iRODS as a familiar cloud-based data store, enhancing find-ability via social, metdata, and search capabilities.  We've leveraged iRODS AVU metadata to implement familiar 'tagging' as well as 'starred' folders and collections.  Those are reflected in the iDrop2 web interface, and we've done a few demos of this in prior posts.

Discovering and depicting shares has been a common use case.  In a way, you could do this by just querying collections or files in relation to their ACL settings, developing a list of shared files and collections.  The issue some had encountered is that iRODS collections can be BIG...lots of files, lots of deeply nested collections.  The queries get big, and the results that a user would get become incomprehensible in an interface.  The answer that we've cooked up with a few community folks is to develop a 'share' as a first class object.  That sounds fancy, but all it means is that, instead of treating a deeply nested collection of 10,000 files as 10,000 shared objects, why not just treat the parent collection as the share, and portray the contents of this share as a 'mount' you can make in an interface.  This way, you see 10 shares with meaningful names, instead of a pagable list of 5,000 files.  We just mixed the existing ACL system with  a new AVU entry very much like the starred folder.

So now, shares work like this:

  • Find a file or collection
  • Mark as 'shared'
    • set inheritance on collections
    • select users and the access rights you want
    • use the sharing service to mark as a share
  • Display the shares
    • We're using specific (SQL) query to develop more complex SQL queries to discover things like:
      • collections I'm sharing with others
      • collections shared with me
      • we'll add things like specifying users involved in the shares as we go
  • In iDrop web, you can now get a quick view (below) that can show things like 'files shared with me'.

As you can see, the 'home' landing page is presenting some of these 'overlays' on the AVU system.  In this case, the user has selected 'Folders shared with me", and the sharing service in Jargon we'll talk about runs the queries to inspect AVU metadata and ACL information to derive the list of shares.  Note that these are at the top level, and are portrayed using an alias, which is a share name that you give when the share is created.   

Note that clicking on the folder icon then sets that share as the top of the tree in the browse view, so a share becomes a 'mount' or a 'drive'.  The nice thing is that shares then get out of the way, and you are back to normal iRODS ACLs and collections, without a lot of cruft.  Here's what you see when you click on a folder in the share view...

What's cool is how unremarkable all of that is!  Note that the browse tree is looking into the share.  I've revealed the metadata tab so you can see the special iRODSUserTagging:Share tag that denotes the share.  Once a share is established, you can alter the rights to the share by just editing ACLs, so if the ACL is removed for a user, their share disappears.  I need to add a share button to this interface, and then we can show how to establish the share.  The code is done though...we'll look at that.

Code for sharing

Sharing is part of jargon-core, in the jargon-user-tagging subproject.  This is in the development branch of git, and as 3.3.0-SNAPSHOT, as of this writing.

Inside the org.irods.jargon.usertagging.sharing package, the key interface is the IRODSSharingService, which has the following interface:

package org.irods.jargon.usertagging.sharing;

import java.util.List;

import org.irods.jargon.core.exception.DataNotFoundException;
import org.irods.jargon.core.exception.FileNotFoundException;
import org.irods.jargon.core.exception.JargonException;
import org.irods.jargon.usertagging.domain.IRODSSharedFileOrCollection;
import org.irods.jargon.usertagging.domain.ShareUser;

 * Service interface to create and manage shares. Like the star and tagging
 * facility, a share is a special metadata tag on a Collection or Data Object,
 * naming that item as shared. In the process of declaring the share, the proper
 * ACL settings are done.
 * * Sharing using a special tag at the 'root' of the share avoids representing
 * every file or collection in a deeply nested shared collection as 'shared', as
 * it would be based purely on the ACL settings. As a first class object, a
 * share can have an alias name, and is considered one unit.
 * @author Mike Conway - DICE (
public interface IRODSSharingService {

  * Create a new share. This will tag the top level as a shared collection
  * and data object, and set requisite AVUs for the provided set of users.
  * Note that collections will have recursive set of permissions, as well as
  * inheritance.
* Note that the share is only settable as originating from the file or * collection owner * * @param irodsSharedFileOrCollection * {@link IRODSSharedFileOrCollection} representing the share * @throws ShareAlreadyExistsException * if a share has already been defined * @throws FileNotFoundException * if the absolute path does not exist in iRODS * @throws JargonException */ void createShare(IRODSSharedFileOrCollection irodsSharedFileOrCollection) throws ShareAlreadyExistsException, FileNotFoundException, JargonException; /** * Given an absolute path to an iRODS file or collection, return the * IRODSSahredFileOrCollection that may exist. Note that * null is returned if no such share exists, and a * FileNotFoundException is returned if the absolute path does * not exist. * * @param irodsAbsolutePath * String with a valid iRODS absolute path to a file * or collection * @return {@link IRODSSharedFileOrCollection} or null * @throws FileNotFoundException * if the target absolute path does not exist in iRODS * @throws JargonException */ IRODSSharedFileOrCollection findShareByAbsolutePath(String irodsAbsolutePath) throws FileNotFoundException, JargonException; /** * Remove the share indicated at the given absolute path. Note that this method will silently ignore an occasion where a share does not * exist for the given path. *
* NOTE: an outstanding issue remains, which is how to handle the ACLs associated with the given file or collection. Right now the share goes away, * but the ACLs remain. It is under consideration to remove all ACLs, or add a flag or method variant that will either preserve or delete the associated * ACLs. * * @param irodsAbsolutePath String with a valid iRODS absolute path to a file * or collection * @throws FileNotFoundException if the iRODS absolute path does not point to a file or collection * @throws JargonException */ void removeShare(String irodsAbsolutePath) throws FileNotFoundException, JargonException; /** * Retrieve a list of collections shared by the given user and zone. No shares will return an empty set. *
* Note here that, for efficiency, the list of users (via theACLs) is not returned in this variant. It is intended that obtaining * the listing would be done as a separate request. A variant may be added later that does do this extra processing * @param userName String with the name of the user who is doing the sharing, based on the owner of the collection. * @param userZone String with the zone for the user. This may be set to blank, in which case the zone of the * logged in user will be used *
* Note that this method uses Specific Query, and the listSharedCollectionsOwnedByUser query alias must be provided. This can * be initialized by running a script in the jargon-user-tagging project to set up all required specific queries. See project documentation. * This method requires and iRODS server that supports Specific Query (iRODS 3.1+) * @return List of {@link IRODSSharedFileOrCollection} that is shared by the user * @throws JargonException */ List listSharedCollectionsOwnedByAUser( String userName, String userZone) throws JargonException; /** * Retrieve a list of collections shared with a given user by another user, as determined by the owner of that collection. *
* Note here that, for efficiency, the list of users (via theACLs) is not returned in this variant. It is intended that obtaining * the listing would be done as a separate request. A variant may be added later that does do this extra processing * @param userName String with the name of the user who is doing the sharing, based on the owner of the collection. * @param userZone String with the zone for the user. This may be set to blank, in which case the zone of the * logged in user will be used *
* Note that this method uses Specific Query, and the listSharedCollectionsSharedWithUser query alias must be provided. This can * be initialized by running a script in the jargon-user-tagging project to set up all required specific queries. See project documentation. * This method requires and iRODS server that supports Specific Query (iRODS 3.1+) * @return List of {@link IRODSSharedFileOrCollection} that is shared by a party with the user * @throws JargonException */ List listSharedCollectionsSharedWithUser( String userName, String userZone) throws JargonException; /** * Handy method to retrieve ACL share details for a share at the given absolute path. Note that if * there is no share, an empty list is returned. This seems to convey the message with the least amount of surprise. * * @param String with a valid iRODS absolute path to a file * or collection * @return List of {@link ShareUser} * @throws FileNotFoundException if the path cannot be found * @throws JargonException */ List listUsersForShare(String irodsAbsolutePath) throws FileNotFoundException, JargonException; /** * Update the name of the share at the given path * @param irodsAbsolutePath String with a valid iRODS absolute path to a file * or collection * @param newShareName String with the desired name of the share * @throws FileNotFoundException if the iRODS file or collection is missing * @throws DataNotFoundException if a current share is not found * @throws JargonException */ void updateShareName(String irodsAbsolutePath, String newShareName) throws FileNotFoundException, DataNotFoundException, JargonException; }

Note that there is a domain object that represents the share, the IRODSSharedFileOrCollection.  This object holds a path, an alias, and a list of users and their access rights desired for the share.  It's really that easy, you just call createShare() with the right parameters, then you call list* methods to get the shares back.  It handles ACLs, metadata, and inheritance of permissions.

The twist is that the queries are too complex for GenQuery, the built-in SQL-like syntax for querying the iRODS metadata catalog.  It uses Specific or SQL query, which is essentially like iRODS stored procedures, where you define a query with parameters via Jargon or the iadmin icommand.  We need to run a couple of shell scripts with iRODS rodsadmin privileges to enable specific query support.

First, Jargon itself has a few simple Specific query statements it relies on to support specific query, namely, it has methods to list and find specific queries stored in iRODS based on alias.  These can be set up by running the commands in the jargon-core library.  I think that these will soon be baked into iRODS, but as of 3.2 they are not yet there.

Next, we're in the process of developing the complex SQL required to list different sorts of shares, and these are being developed in the jargon-user-tagging subproject of jargon, and are provisioned by running the

Run those two scripts as rodsadmin and you should be good to go!  We're testing and integrating with iDrop web2, so it'll be a few weeks before the final API and queries are settled, but it's working now, and worth a try...see if it satisfies your use case!

Friday, January 18, 2013

Preview of Synch in iDROP Desktop 2

Here is a quick video that shows setting up and running synchronizations in the new version of iDrop Desktop.  We'll be out with a release in the very near future.

Sharing data through iRODS Tickets

iRODS is a system for managing distributed data according to management policies defined at each remote storage location.  Data is 'distributed' in two senses, it may be geographically distributed across heterogeneous servers, and it may be distributed between different organizations.  I like to say it's a 'managed web' of data.  iRODS manages access through access control lists (ACLs), through rules that are triggered at policy enforcement points, and through the recording of audit trails for these operations.

With all of these  mechanisms for controlling access to data, how can you do ad-hoc, or 'easy' sharing?  If I open up files and collections to be shared, how can I manage this shared data?  There are several answers to this question, and one of the most useful is to use the new ticket facility that appeared in iRODS 3.1.

We'll look at ticket support in Jargon and in iDrop web to illustrate how you can use tickets in your own projects.

Thursday, January 17, 2013

Getting ready to do an alpha of iDrop 2

In an effort to push out some early access previews of iDrop2, I'm starting to put together a few videos. These are not formal, they are meant to give an idea of what's in the new code.

The first little video is here, it just goes over some of the new look and feel, as we've cleaned things up and moved it to Twitter Bootstrap.

Tomorrow I need to put something together on iDrop Desktop 2, which we're working on with RENCI, as well as a code tutorial on tickets, I thought a screen cast of tickets in use might help there too.

In the meantime, you can check out the new iDrop by visiting GForge. There you will find git access info for our code repository.

The new iDrop2 interface work can be found in the origin/984-idrop-web-redesign branch.

Friday, January 11, 2013

Clojure and Jargon

A great thing about Java these days is that the JVM can run all manner of languages, including Clojure.  I'm actually working on a community project now that uses the Jargon libraries from that language.

At any rate, to help dig into the issue, I'm having to learn a bit of Clojure myself, I'm using this intro, which seems pretty nice:

Thursday, January 10, 2013

On Queries and Pagination

This was originally in iRODS chat on Gmail, but I am putting up this blog to better capture these.  At some point they might make a nice user guide!

Queries and Paging

The Jargon API has a domain model (in the package) that represents the iRODS system catalog, or iCAT. iRODS provides several methods to query data in this catalog (see the Jargon Wiki for some info on queries), including GenQuery. GenQuery, or general query, is a technique to access the iCAT with sql-like select statements. This is used in the iquest icommand.

The various domains (collection, data object, etc) in the iCAT have a corresponding domain object, and services for each part of the domain are provided as 'access objects'. These services often do queries under the covers and return data as collections of domain objects. Different sorts of queries (GenQuery, SimpleQuery, SpecificQuery) are also available as a service, and will return result set objects.

When result sets or domain objects are returned from Jargon, they carry with them information that can be used to page through large result sets. iRODS queries are designed to be pageable, that mean you may either query, close the query, and then query again with an offset, or you may leave the query open and get the next result set by continuing the query. Both are roughly the same from the client perspective, but paging via offset does not leave the query open on the iRODS side, using a continuation and not closing the query for each page does so, and if too many of these are done, it can impact server side performance.

If you are in a 'web' or 'session per request' context, then paging via offset is really the option you want, otherwise, you may inadvertently consume all open iCAT database connections.

Referencing the IRODSGenQueryExecutor in the package, here is and example of a query method that will allow specification of an offset, and closes the query for each page read:

* Execute an iquest-like query and return results in a convenient POJO
* object. This method allows partial starts to do paging of large query
* results. This method will send a close to iRODS if more results are
* available.
* <p/>
* Note that the <code>getMoreResults()</code> method will not work, since
* the result set was closed. This version of the query execute is suitable
* for 'session per request' situations, such as mid-tier web applications,
* where connections are not held for stateful interaction. In these
* situations, query can be accomplished with an offset.
@param irodsQuery
*            {@link org.irods.jargon.core.query.AbstractIRODSGenQuery} that
*            will wrap the given query
@param partialStartIndex
*            <code>int</code> that indicates an offset within the results
*            from which to build the returned result set.
@return {@link org.irods.jargon.core.query.IRODSQueryResultSet} that
*         contains the results of the query
@throws JargonException
@throws JargonQueryException
IRODSQueryResultSet executeIRODSQueryAndCloseResult(
AbstractIRODSGenQuery irodsQuery, int partialStartIndex)
throws JargonException, JargonQueryException;

If you are in a command line, or in a client that maintains a connection and will process all the results at one time, you can use the continuation. In that case, you are responsible for calling close on the connection when done.

Here is an example of a method that uses continuation:

* Execute an iquest-like query and return results in a convenient POJO
* object.
* <p/>
* Note: this command will not close the underlying result set, so that it
* may be paged by getting next result. It is up to the caller to call
<code>closeResults()</code> when done with the result set. Alternately,
* the <code>executeIRODSQueryAndCloseResults()</code> method may be
* employed.
@param irodsQuery
*            {@link org.irods.jargon.core.query.AbstractIRODSGenQuery} that
*            will wrap the given iquest-like query
@param continueIndex
*            <code>int</code> that indicates whether this is a requery when
*            more resuts than the limit have been generated
@return {@link org.irods.jargon.core.query.IRODSQueryResultSet} that
*         contains the results of the query
@throws JargonException
@throws JargonQueryException
IRODSQueryResultSet executeIRODSQuery(
final AbstractIRODSGenQuery irodsQuery, final int continueIndex)
throws JargonException, JargonQueryException;

I highly recommend looking in the corresponding test directory for the JUnit tests that exercise the various options, and you can see the methods in use.

I'd further add that the result sets and irods 'domain' objects are built to help you with paging. Note that query results include counts and flags that indicate whether more results are available. Objects in, all extend the IRODSDomainObject superclass, carrying this additional information:

* Sequence number for this records based on query result
private int count = 0;

* Is this the last result from the query or set
private boolean lastResult = false;

* Total number of records for the given query.  Note that this is not always available, depending
* on the iCAT database
private int totalRecords = 0;

So any Jargon methods that return collections, such as the CollectionAO and DataObjectAO, return objects that carry within them information on the paging status. You can use the count to compute the next offset.

Paging through results

So you have run a method and gotten a set of domain objects, and you want to build an interface, what do you do?

Well, often, these collections are too big to get at one time, or to page through, so we need to set up paging controls. This is something I'm working on now for iDrop web and the iDrop Swing GUI, so if I can get something useful for both those UI's it's probably generally useful, so I've been adding an org.irods.jargon.datautils.pagination package to the jargon-data-utils subproject.

NB Jargon is a maven multi-module project available here, and jargon-data-utils is one of the sub-projects.

The idea is to create a 'model' in the MVC sense that can be then used to generate a set of paging controls on a web or Swing UI.  In the pagination package, this model is called PagingActions.  The class has some methods to get record counts and current index, based on the collection that was returned from Jargon.  Currently, there is a utility class in that package called PagingAnalyzer, which takes a List of IRODSDomainObject and returns the PagingActions model.

The model has a set of 'index' entries that are generated by looking at what is represented by the List of IRODSDomainObject that was returned by Jargon.  It will typically have "<<", "<", ">", and ">>" (first, previous, next, last), based on the position of the given List in terms of all available data.  It will also have a set of direct indexes, representing the 10 or so pages around the current index, for jumping to a page.

In this example, from iDrop web, I am listing collections under a subdirectory and building the PagingActions for the current collection (from BrowseController, a Grails controller):

def entries = collectionAndDataObjectListAndSearchAO.listDataObjectsAndCollectionsUnderPath(absPath)

int pageSize = irodsAccessObjectFactory.jargonProperties.maxFilesAndDirsQueryMax

PagingActions pagingActions = PagingAnalyser.buildPagingActionsFromListOfIRODSDomainObjects(entries, pageSize)

As you can see, the List of entries can be used to create the pagingActions model.  This is placed into the model for rendering by a page view.  Right now I'm just displaying, and need to add the actions when each link is clicked.  The grails view template looks like this:

<div id="browseDetailsToolbar" ><div class="pagination">  <ul>  <g:each in="${pagingActions.pagingIndexEntries}">   <li><a href="#">${it.representation}</a></li>  </g:each>

Where the representation is what we want to display on the link.  The PagingActions model is smart enough to know whether the various 'first', 'previous', 'next', and other buttons should be included.  In rendering, it looks like this:

Hopefully you can see the paging controls above the center panel. What will be left is adding some links and JavaScript to take the index information in each paging index item, and use that as an offset for the next Jargon method call.

Jargon will take care of closing result sets for you!

A note that I've gone back and forth on this, but decided that the 'count' of each item, which is part of IRODSDomainObject and therefore available in each item in the collection, is a count, not an index, it's 1 based, not 0 based.  This means you can take the count of the last item as the offset for the next query

By the same token, each 'page' of the index items is 1 based.  The first page is 1, not 0.  This can be confusing, just remember that the last item in each Jargon collection is the count, and the last item, as an offset, will return the item after it when using it as the partialStartIndex of a Jargon method.

I'm still working on this, and need to try it in Swing as well.  I'm trying to include support for either 'button' style interaction, or to create a model for a slider control, so this is all subject to change.

Welcome to the Jargon Blog

Welcome to the Jargon Blog!  I'm the Java developer for the iRODS data grid.  You can find iRODS home page here.

What is iRODS?

From the DICE home page, here is how it is described:

iRODS™, the Integrated Rule-Oriented Data System, is a data grid software system developed by the Data Intensive Cyber Environments research group (developers of the SRB, the Storage Resource Broker), and collaborators. The iRODS system is based on expertise gained through a decade of applying the SRB technology in support of Data Grids, Digital Libraries, Persistent Archives, and Real-time Data Systems. iRODS management policies (sets of assertions these communities make about their digital collections) are characterized in iRODS Rules and state information. At the iRODS core, a Rule Engine interprets the Rules to decide how the system is to respond to various requests and conditions. iRODS is open source under a BSD license.

A quick fact sheet is available here too!

What is it really?

iRODS is very hard to describe in one paragraph, I think it's a new sort of middleware platform, meant to depict all sorts of data in a hierarchical, logical namespace.  Think of it as a resource server, in the 'REST-ful' sense of a resource. It organizes distributed collections under regular, logical path names.

  • iRODS manages a common logical namespace over distributed collections of data. 
  • The data is based on an abstraction of a global file system.
  • iRODS allows the specification of policies at each storage location to manage the data.
  • iRODS manages access controls, as well as easy sharing and federation of data between organizations.
  • iRODS automatically maintains metadata about distributed collections in a master catalog, as well as user-defined metadata.
  • iRODS can automatically maintain audit logs of all activities on the grid.
There's a lot to the platform, it occupies a unique space, and sits in the middle of a lot of cool trends in computing, such as:

  • 'big data'
  • large scale cyber-infrastructure
  • metadata and find-ability
  • long term preservation and data management
  • emerging standards for defining a trusted digital repository
  • data life cycle management
  • policy-based preservation

This blog is about the Java (and PHP) API for the iRODS data grid, and I hope to use this to develop this blog as a useful tool for developers working on the above problem set.  So let's have a go!