Thursday, January 31, 2013

A video demo of shares in iDrop web2

In a previous post, we talked about the addition of sharing to the Jargon libraries, and how that worked in code.

Here's a demo of sharing at work in the iDrop Web2 interface, and it shows how the AVU metadata and iRODS access control list (ACL) data works together to make shares.  It shows how a 'share' becomes a first-class object on the home page, and sort of like a 'drive' that can be mounted in a browse view.



Friday, January 25, 2013

Sharing

As we try and present this underlying complexity in interfaces and high-level API, certain themes  keeps repeating.  iRODS provides a fairly comprehensive set of tools to manage complex collections, including metadata management, automatic system metadata maintenance,  a global logical namespace, policy enforcement, ACL management, audit, and the like.

iRODS has a full ACL management system.  This includes the ability to federate iRODS zones and allow cross-zone access right management.  It's very powerful when combined with the audit log capability of the iRODS server.  iRODS has the mechanics covered...how can we take these elements and combine them in ways that create new services?  That's part of the fun of working on Jargon and iRODS user interfaces.  One thing that intrigues me is to look at iRODS as an application development platform from which new kinds of policy aware, distributed data applications are formed.  Sometimes the point is to look at merging iRODS middleware with other services (like Sesame and Lucene, as we're working on with the HIVE integration project, something we'll look at later).

A basic goal of Jargon and the iDrop projects is to create API and interfaces that can present iRODS as a familiar cloud-based data store, enhancing find-ability via social, metdata, and search capabilities.  We've leveraged iRODS AVU metadata to implement familiar 'tagging' as well as 'starred' folders and collections.  Those are reflected in the iDrop2 web interface, and we've done a few demos of this in prior posts.


Discovering and depicting shares has been a common use case.  In a way, you could do this by just querying collections or files in relation to their ACL settings, developing a list of shared files and collections.  The issue some had encountered is that iRODS collections can be BIG...lots of files, lots of deeply nested collections.  The queries get big, and the results that a user would get become incomprehensible in an interface.  The answer that we've cooked up with a few community folks is to develop a 'share' as a first class object.  That sounds fancy, but all it means is that, instead of treating a deeply nested collection of 10,000 files as 10,000 shared objects, why not just treat the parent collection as the share, and portray the contents of this share as a 'mount' you can make in an interface.  This way, you see 10 shares with meaningful names, instead of a pagable list of 5,000 files.  We just mixed the existing ACL system with  a new AVU entry very much like the starred folder.

So now, shares work like this:


  • Find a file or collection
  • Mark as 'shared'
    • set inheritance on collections
    • select users and the access rights you want
    • use the sharing service to mark as a share
  • Display the shares
    • We're using specific (SQL) query to develop more complex SQL queries to discover things like:
      • collections I'm sharing with others
      • collections shared with me
      • we'll add things like specifying users involved in the shares as we go
  • In iDrop web, you can now get a quick view (below) that can show things like 'files shared with me'.



As you can see, the 'home' landing page is presenting some of these 'overlays' on the AVU system.  In this case, the user has selected 'Folders shared with me", and the sharing service in Jargon we'll talk about runs the queries to inspect AVU metadata and ACL information to derive the list of shares.  Note that these are at the top level, and are portrayed using an alias, which is a share name that you give when the share is created.   

Note that clicking on the folder icon then sets that share as the top of the tree in the browse view, so a share becomes a 'mount' or a 'drive'.  The nice thing is that shares then get out of the way, and you are back to normal iRODS ACLs and collections, without a lot of cruft.  Here's what you see when you click on a folder in the share view...





What's cool is how unremarkable all of that is!  Note that the browse tree is looking into the share.  I've revealed the metadata tab so you can see the special iRODSUserTagging:Share tag that denotes the share.  Once a share is established, you can alter the rights to the share by just editing ACLs, so if the ACL is removed for a user, their share disappears.  I need to add a share button to this interface, and then we can show how to establish the share.  The code is done though...we'll look at that.

Code for sharing


Sharing is part of jargon-core, in the jargon-user-tagging subproject.  This is in the development branch of git, and as 3.3.0-SNAPSHOT, as of this writing.

Inside the org.irods.jargon.usertagging.sharing package, the key interface is the IRODSSharingService, which has the following interface:


package org.irods.jargon.usertagging.sharing;

import java.util.List;

import org.irods.jargon.core.exception.DataNotFoundException;
import org.irods.jargon.core.exception.FileNotFoundException;
import org.irods.jargon.core.exception.JargonException;
import org.irods.jargon.usertagging.domain.IRODSSharedFileOrCollection;
import org.irods.jargon.usertagging.domain.ShareUser;

/**
 * Service interface to create and manage shares. Like the star and tagging
 * facility, a share is a special metadata tag on a Collection or Data Object,
 * naming that item as shared. In the process of declaring the share, the proper
 * ACL settings are done.
 * * Sharing using a special tag at the 'root' of the share avoids representing
 * every file or collection in a deeply nested shared collection as 'shared', as
 * it would be based purely on the ACL settings. As a first class object, a
 * share can have an alias name, and is considered one unit.
 * 
 * @author Mike Conway - DICE (www.irods.org)
 * 
 */
public interface IRODSSharingService {

 /**
  * Create a new share. This will tag the top level as a shared collection
  * and data object, and set requisite AVUs for the provided set of users.
  * Note that collections will have recursive set of permissions, as well as
  * inheritance.
  * 
* Note that the share is only settable as originating from the file or * collection owner * * @param irodsSharedFileOrCollection * {@link IRODSSharedFileOrCollection} representing the share * @throws ShareAlreadyExistsException * if a share has already been defined * @throws FileNotFoundException * if the absolute path does not exist in iRODS * @throws JargonException */ void createShare(IRODSSharedFileOrCollection irodsSharedFileOrCollection) throws ShareAlreadyExistsException, FileNotFoundException, JargonException; /** * Given an absolute path to an iRODS file or collection, return the * IRODSSahredFileOrCollection that may exist. Note that * null is returned if no such share exists, and a * FileNotFoundException is returned if the absolute path does * not exist. * * @param irodsAbsolutePath * String with a valid iRODS absolute path to a file * or collection * @return {@link IRODSSharedFileOrCollection} or null * @throws FileNotFoundException * if the target absolute path does not exist in iRODS * @throws JargonException */ IRODSSharedFileOrCollection findShareByAbsolutePath(String irodsAbsolutePath) throws FileNotFoundException, JargonException; /** * Remove the share indicated at the given absolute path. Note that this method will silently ignore an occasion where a share does not * exist for the given path. *
* NOTE: an outstanding issue remains, which is how to handle the ACLs associated with the given file or collection. Right now the share goes away, * but the ACLs remain. It is under consideration to remove all ACLs, or add a flag or method variant that will either preserve or delete the associated * ACLs. * * @param irodsAbsolutePath String with a valid iRODS absolute path to a file * or collection * @throws FileNotFoundException if the iRODS absolute path does not point to a file or collection * @throws JargonException */ void removeShare(String irodsAbsolutePath) throws FileNotFoundException, JargonException; /** * Retrieve a list of collections shared by the given user and zone. No shares will return an empty set. *
* Note here that, for efficiency, the list of users (via theACLs) is not returned in this variant. It is intended that obtaining * the listing would be done as a separate request. A variant may be added later that does do this extra processing * @param userName String with the name of the user who is doing the sharing, based on the owner of the collection. * @param userZone String with the zone for the user. This may be set to blank, in which case the zone of the * logged in user will be used *
* Note that this method uses Specific Query, and the listSharedCollectionsOwnedByUser query alias must be provided. This can * be initialized by running a script in the jargon-user-tagging project to set up all required specific queries. See project documentation. * This method requires and iRODS server that supports Specific Query (iRODS 3.1+) * @return List of {@link IRODSSharedFileOrCollection} that is shared by the user * @throws JargonException */ List listSharedCollectionsOwnedByAUser( String userName, String userZone) throws JargonException; /** * Retrieve a list of collections shared with a given user by another user, as determined by the owner of that collection. *
* Note here that, for efficiency, the list of users (via theACLs) is not returned in this variant. It is intended that obtaining * the listing would be done as a separate request. A variant may be added later that does do this extra processing * @param userName String with the name of the user who is doing the sharing, based on the owner of the collection. * @param userZone String with the zone for the user. This may be set to blank, in which case the zone of the * logged in user will be used *
* Note that this method uses Specific Query, and the listSharedCollectionsSharedWithUser query alias must be provided. This can * be initialized by running a script in the jargon-user-tagging project to set up all required specific queries. See project documentation. * This method requires and iRODS server that supports Specific Query (iRODS 3.1+) * @return List of {@link IRODSSharedFileOrCollection} that is shared by a party with the user * @throws JargonException */ List listSharedCollectionsSharedWithUser( String userName, String userZone) throws JargonException; /** * Handy method to retrieve ACL share details for a share at the given absolute path. Note that if * there is no share, an empty list is returned. This seems to convey the message with the least amount of surprise. * * @param String with a valid iRODS absolute path to a file * or collection * @return List of {@link ShareUser} * @throws FileNotFoundException if the path cannot be found * @throws JargonException */ List listUsersForShare(String irodsAbsolutePath) throws FileNotFoundException, JargonException; /** * Update the name of the share at the given path * @param irodsAbsolutePath String with a valid iRODS absolute path to a file * or collection * @param newShareName String with the desired name of the share * @throws FileNotFoundException if the iRODS file or collection is missing * @throws DataNotFoundException if a current share is not found * @throws JargonException */ void updateShareName(String irodsAbsolutePath, String newShareName) throws FileNotFoundException, DataNotFoundException, JargonException; }








Note that there is a domain object that represents the share, the IRODSSharedFileOrCollection.  This object holds a path, an alias, and a list of users and their access rights desired for the share.  It's really that easy, you just call createShare() with the right parameters, then you call list* methods to get the shares back.  It handles ACLs, metadata, and inheritance of permissions.

The twist is that the queries are too complex for GenQuery, the built-in SQL-like syntax for querying the iRODS metadata catalog.  It uses Specific or SQL query, which is essentially like iRODS stored procedures, where you define a query with parameters via Jargon or the iadmin icommand.  We need to run a couple of shell scripts with iRODS rodsadmin privileges to enable specific query support.

First, Jargon itself has a few simple Specific query statements it relies on to support specific query, namely, it has methods to list and find specific queries stored in iRODS based on alias.  These can be set up by running the jargon-specquery.sh commands in the jargon-core library.  I think that these will soon be baked into iRODS, but as of 3.2 they are not yet there.

Next, we're in the process of developing the complex SQL required to list different sorts of shares, and these are being developed in the jargon-user-tagging subproject of jargon, and are provisioned by running the usertagging_specquery.sh.

Run those two scripts as rodsadmin and you should be good to go!  We're testing and integrating with iDrop web2, so it'll be a few weeks before the final API and queries are settled, but it's working now, and worth a try...see if it satisfies your use case!





Friday, January 18, 2013

Preview of Synch in iDROP Desktop 2

Here is a quick video that shows setting up and running synchronizations in the new version of iDrop Desktop.  We'll be out with a release in the very near future.


Sharing data through iRODS Tickets

iRODS is a system for managing distributed data according to management policies defined at each remote storage location.  Data is 'distributed' in two senses, it may be geographically distributed across heterogeneous servers, and it may be distributed between different organizations.  I like to say it's a 'managed web' of data.  iRODS manages access through access control lists (ACLs), through rules that are triggered at policy enforcement points, and through the recording of audit trails for these operations.

With all of these  mechanisms for controlling access to data, how can you do ad-hoc, or 'easy' sharing?  If I open up files and collections to be shared, how can I manage this shared data?  There are several answers to this question, and one of the most useful is to use the new ticket facility that appeared in iRODS 3.1.

We'll look at ticket support in Jargon and in iDrop web to illustrate how you can use tickets in your own projects.


Thursday, January 17, 2013

Getting ready to do an alpha of iDrop 2

In an effort to push out some early access previews of iDrop2, I'm starting to put together a few videos. These are not formal, they are meant to give an idea of what's in the new code.

The first little video is here, it just goes over some of the new look and feel, as we've cleaned things up and moved it to Twitter Bootstrap.




Tomorrow I need to put something together on iDrop Desktop 2, which we're working on with RENCI, as well as a code tutorial on tickets, I thought a screen cast of tickets in use might help there too.

In the meantime, you can check out the new iDrop by visiting GForge. There you will find git access info for our code repository.

The new iDrop2 interface work can be found in the origin/984-idrop-web-redesign branch.

Friday, January 11, 2013

Clojure and Jargon

A great thing about Java these days is that the JVM can run all manner of languages, including Clojure.  I'm actually working on a community project now that uses the Jargon libraries from that language.

At any rate, to help dig into the issue, I'm having to learn a bit of Clojure myself, I'm using this intro, which seems pretty nice:

http://java.ociweb.com/mark/clojure/article.html

Thursday, January 10, 2013

On Queries and Pagination

This was originally in iRODS chat on Gmail, but I am putting up this blog to better capture these.  At some point they might make a nice user guide!

Queries and Paging

The Jargon API has a domain model (in the org.irods.jargon.pub.domain package) that represents the iRODS system catalog, or iCAT. iRODS provides several methods to query data in this catalog (see the Jargon Wiki for some info on queries), including GenQuery. GenQuery, or general query, is a technique to access the iCAT with sql-like select statements. This is used in the iquest icommand.

The various domains (collection, data object, etc) in the iCAT have a corresponding domain object, and services for each part of the domain are provided as 'access objects'. These services often do queries under the covers and return data as collections of domain objects. Different sorts of queries (GenQuery, SimpleQuery, SpecificQuery) are also available as a service, and will return result set objects.

When result sets or domain objects are returned from Jargon, they carry with them information that can be used to page through large result sets. iRODS queries are designed to be pageable, that mean you may either query, close the query, and then query again with an offset, or you may leave the query open and get the next result set by continuing the query. Both are roughly the same from the client perspective, but paging via offset does not leave the query open on the iRODS side, using a continuation and not closing the query for each page does so, and if too many of these are done, it can impact server side performance.

If you are in a 'web' or 'session per request' context, then paging via offset is really the option you want, otherwise, you may inadvertently consume all open iCAT database connections.

Referencing the IRODSGenQueryExecutor in the org.irods.jargon.core.pub package, here is and example of a query method that will allow specification of an offset, and closes the query for each page read:

/**
* Execute an iquest-like query and return results in a convenient POJO
* object. This method allows partial starts to do paging of large query
* results. This method will send a close to iRODS if more results are
* available.
* <p/>
* Note that the <code>getMoreResults()</code> method will not work, since
* the result set was closed. This version of the query execute is suitable
* for 'session per request' situations, such as mid-tier web applications,
* where connections are not held for stateful interaction. In these
* situations, query can be accomplished with an offset.
@param irodsQuery
*            {@link org.irods.jargon.core.query.AbstractIRODSGenQuery} that
*            will wrap the given query
@param partialStartIndex
*            <code>int</code> that indicates an offset within the results
*            from which to build the returned result set.
@return {@link org.irods.jargon.core.query.IRODSQueryResultSet} that
*         contains the results of the query
@throws JargonException
@throws JargonQueryException
*/
IRODSQueryResultSet executeIRODSQueryAndCloseResult(
AbstractIRODSGenQuery irodsQuery, int partialStartIndex)
throws JargonException, JargonQueryException;


If you are in a command line, or in a client that maintains a connection and will process all the results at one time, you can use the continuation. In that case, you are responsible for calling close on the connection when done.

Here is an example of a method that uses continuation:

/**
* Execute an iquest-like query and return results in a convenient POJO
* object.
* <p/>
* Note: this command will not close the underlying result set, so that it
* may be paged by getting next result. It is up to the caller to call
<code>closeResults()</code> when done with the result set. Alternately,
* the <code>executeIRODSQueryAndCloseResults()</code> method may be
* employed.
@param irodsQuery
*            {@link org.irods.jargon.core.query.AbstractIRODSGenQuery} that
*            will wrap the given iquest-like query
@param continueIndex
*            <code>int</code> that indicates whether this is a requery when
*            more resuts than the limit have been generated
@return {@link org.irods.jargon.core.query.IRODSQueryResultSet} that
*         contains the results of the query
@throws JargonException
@throws JargonQueryException
*/
IRODSQueryResultSet executeIRODSQuery(
final AbstractIRODSGenQuery irodsQuery, final int continueIndex)
throws JargonException, JargonQueryException;


I highly recommend looking in the corresponding test directory for the JUnit tests that exercise the various options, and you can see the methods in use.


I'd further add that the result sets and irods 'domain' objects are built to help you with paging. Note that query results include counts and flags that indicate whether more results are available. Objects in org.irods.jargon.core.pub.domain, all extend the IRODSDomainObject superclass, carrying this additional information:

/***
* Sequence number for this records based on query result
*/
private int count = 0;

/**
* Is this the last result from the query or set
*/
private boolean lastResult = false;

/**
* Total number of records for the given query.  Note that this is not always available, depending
* on the iCAT database
*/
private int totalRecords = 0;


So any Jargon methods that return collections, such as the CollectionAO and DataObjectAO, return objects that carry within them information on the paging status. You can use the count to compute the next offset.

Paging through results


So you have run a method and gotten a set of domain objects, and you want to build an interface, what do you do?

Well, often, these collections are too big to get at one time, or to page through, so we need to set up paging controls. This is something I'm working on now for iDrop web and the iDrop Swing GUI, so if I can get something useful for both those UI's it's probably generally useful, so I've been adding an org.irods.jargon.datautils.pagination package to the jargon-data-utils subproject.

NB Jargon is a maven multi-module project available here, and jargon-data-utils is one of the sub-projects.

The idea is to create a 'model' in the MVC sense that can be then used to generate a set of paging controls on a web or Swing UI.  In the pagination package, this model is called PagingActions.  The class has some methods to get record counts and current index, based on the collection that was returned from Jargon.  Currently, there is a utility class in that package called PagingAnalyzer, which takes a List of IRODSDomainObject and returns the PagingActions model.

The model has a set of 'index' entries that are generated by looking at what is represented by the List of IRODSDomainObject that was returned by Jargon.  It will typically have "<<", "<", ">", and ">>" (first, previous, next, last), based on the position of the given List in terms of all available data.  It will also have a set of direct indexes, representing the 10 or so pages around the current index, for jumping to a page.

In this example, from iDrop web, I am listing collections under a subdirectory and building the PagingActions for the current collection (from BrowseController, a Grails controller):

def entries = collectionAndDataObjectListAndSearchAO.listDataObjectsAndCollectionsUnderPath(absPath)

int pageSize = irodsAccessObjectFactory.jargonProperties.maxFilesAndDirsQueryMax

PagingActions pagingActions = PagingAnalyser.buildPagingActionsFromListOfIRODSDomainObjects(entries, pageSize)


As you can see, the List of entries can be used to create the pagingActions model.  This is placed into the model for rendering by a page view.  Right now I'm just displaying, and need to add the actions when each link is clicked.  The grails view template looks like this:

<div id="browseDetailsToolbar" ><div class="pagination">  <ul>  <g:each in="${pagingActions.pagingIndexEntries}">   <li><a href="#">${it.representation}</a></li>  </g:each>
  </ul>
</div></div>


Where the representation is what we want to display on the link.  The PagingActions model is smart enough to know whether the various 'first', 'previous', 'next', and other buttons should be included.  In rendering, it looks like this:





Hopefully you can see the paging controls above the center panel. What will be left is adding some links and JavaScript to take the index information in each paging index item, and use that as an offset for the next Jargon method call.

Jargon will take care of closing result sets for you!

A note that I've gone back and forth on this, but decided that the 'count' of each item, which is part of IRODSDomainObject and therefore available in each item in the collection, is a count, not an index, it's 1 based, not 0 based.  This means you can take the count of the last item as the offset for the next query

By the same token, each 'page' of the index items is 1 based.  The first page is 1, not 0.  This can be confusing, just remember that the last item in each Jargon collection is the count, and the last item, as an offset, will return the item after it when using it as the partialStartIndex of a Jargon method.

I'm still working on this, and need to try it in Swing as well.  I'm trying to include support for either 'button' style interaction, or to create a model for a slider control, so this is all subject to change.


Welcome to the Jargon Blog

Welcome to the Jargon Blog!  I'm the Java developer for the iRODS data grid.  You can find iRODS home page here.

What is iRODS?

From the DICE home page, here is how it is described:

iRODS™, the Integrated Rule-Oriented Data System, is a data grid software system developed by the Data Intensive Cyber Environments research group (developers of the SRB, the Storage Resource Broker), and collaborators. The iRODS system is based on expertise gained through a decade of applying the SRB technology in support of Data Grids, Digital Libraries, Persistent Archives, and Real-time Data Systems. iRODS management policies (sets of assertions these communities make about their digital collections) are characterized in iRODS Rules and state information. At the iRODS core, a Rule Engine interprets the Rules to decide how the system is to respond to various requests and conditions. iRODS is open source under a BSD license.


A quick fact sheet is available here too!


What is it really?

iRODS is very hard to describe in one paragraph, I think it's a new sort of middleware platform, meant to depict all sorts of data in a hierarchical, logical namespace.  Think of it as a resource server, in the 'REST-ful' sense of a resource. It organizes distributed collections under regular, logical path names.

  • iRODS manages a common logical namespace over distributed collections of data. 
  • The data is based on an abstraction of a global file system.
  • iRODS allows the specification of policies at each storage location to manage the data.
  • iRODS manages access controls, as well as easy sharing and federation of data between organizations.
  • iRODS automatically maintains metadata about distributed collections in a master catalog, as well as user-defined metadata.
  • iRODS can automatically maintain audit logs of all activities on the grid.
There's a lot to the platform, it occupies a unique space, and sits in the middle of a lot of cool trends in computing, such as:

  • 'big data'
  • large scale cyber-infrastructure
  • metadata and find-ability
  • long term preservation and data management
  • emerging standards for defining a trusted digital repository
  • data life cycle management
  • policy-based preservation

This blog is about the Java (and PHP) API for the iRODS data grid, and I hope to use this to develop this blog as a useful tool for developers working on the above problem set.  So let's have a go!