Thursday, January 10, 2013

On Queries and Pagination

This was originally in iRODS chat on Gmail, but I am putting up this blog to better capture these.  At some point they might make a nice user guide!

Queries and Paging

The Jargon API has a domain model (in the org.irods.jargon.pub.domain package) that represents the iRODS system catalog, or iCAT. iRODS provides several methods to query data in this catalog (see the Jargon Wiki for some info on queries), including GenQuery. GenQuery, or general query, is a technique to access the iCAT with sql-like select statements. This is used in the iquest icommand.

The various domains (collection, data object, etc) in the iCAT have a corresponding domain object, and services for each part of the domain are provided as 'access objects'. These services often do queries under the covers and return data as collections of domain objects. Different sorts of queries (GenQuery, SimpleQuery, SpecificQuery) are also available as a service, and will return result set objects.

When result sets or domain objects are returned from Jargon, they carry with them information that can be used to page through large result sets. iRODS queries are designed to be pageable, that mean you may either query, close the query, and then query again with an offset, or you may leave the query open and get the next result set by continuing the query. Both are roughly the same from the client perspective, but paging via offset does not leave the query open on the iRODS side, using a continuation and not closing the query for each page does so, and if too many of these are done, it can impact server side performance.

If you are in a 'web' or 'session per request' context, then paging via offset is really the option you want, otherwise, you may inadvertently consume all open iCAT database connections.

Referencing the IRODSGenQueryExecutor in the org.irods.jargon.core.pub package, here is and example of a query method that will allow specification of an offset, and closes the query for each page read:

/**
* Execute an iquest-like query and return results in a convenient POJO
* object. This method allows partial starts to do paging of large query
* results. This method will send a close to iRODS if more results are
* available.
* <p/>
* Note that the <code>getMoreResults()</code> method will not work, since
* the result set was closed. This version of the query execute is suitable
* for 'session per request' situations, such as mid-tier web applications,
* where connections are not held for stateful interaction. In these
* situations, query can be accomplished with an offset.
@param irodsQuery
*            {@link org.irods.jargon.core.query.AbstractIRODSGenQuery} that
*            will wrap the given query
@param partialStartIndex
*            <code>int</code> that indicates an offset within the results
*            from which to build the returned result set.
@return {@link org.irods.jargon.core.query.IRODSQueryResultSet} that
*         contains the results of the query
@throws JargonException
@throws JargonQueryException
*/
IRODSQueryResultSet executeIRODSQueryAndCloseResult(
AbstractIRODSGenQuery irodsQuery, int partialStartIndex)
throws JargonException, JargonQueryException;


If you are in a command line, or in a client that maintains a connection and will process all the results at one time, you can use the continuation. In that case, you are responsible for calling close on the connection when done.

Here is an example of a method that uses continuation:

/**
* Execute an iquest-like query and return results in a convenient POJO
* object.
* <p/>
* Note: this command will not close the underlying result set, so that it
* may be paged by getting next result. It is up to the caller to call
<code>closeResults()</code> when done with the result set. Alternately,
* the <code>executeIRODSQueryAndCloseResults()</code> method may be
* employed.
@param irodsQuery
*            {@link org.irods.jargon.core.query.AbstractIRODSGenQuery} that
*            will wrap the given iquest-like query
@param continueIndex
*            <code>int</code> that indicates whether this is a requery when
*            more resuts than the limit have been generated
@return {@link org.irods.jargon.core.query.IRODSQueryResultSet} that
*         contains the results of the query
@throws JargonException
@throws JargonQueryException
*/
IRODSQueryResultSet executeIRODSQuery(
final AbstractIRODSGenQuery irodsQuery, final int continueIndex)
throws JargonException, JargonQueryException;


I highly recommend looking in the corresponding test directory for the JUnit tests that exercise the various options, and you can see the methods in use.


I'd further add that the result sets and irods 'domain' objects are built to help you with paging. Note that query results include counts and flags that indicate whether more results are available. Objects in org.irods.jargon.core.pub.domain, all extend the IRODSDomainObject superclass, carrying this additional information:

/***
* Sequence number for this records based on query result
*/
private int count = 0;

/**
* Is this the last result from the query or set
*/
private boolean lastResult = false;

/**
* Total number of records for the given query.  Note that this is not always available, depending
* on the iCAT database
*/
private int totalRecords = 0;


So any Jargon methods that return collections, such as the CollectionAO and DataObjectAO, return objects that carry within them information on the paging status. You can use the count to compute the next offset.

Paging through results


So you have run a method and gotten a set of domain objects, and you want to build an interface, what do you do?

Well, often, these collections are too big to get at one time, or to page through, so we need to set up paging controls. This is something I'm working on now for iDrop web and the iDrop Swing GUI, so if I can get something useful for both those UI's it's probably generally useful, so I've been adding an org.irods.jargon.datautils.pagination package to the jargon-data-utils subproject.

NB Jargon is a maven multi-module project available here, and jargon-data-utils is one of the sub-projects.

The idea is to create a 'model' in the MVC sense that can be then used to generate a set of paging controls on a web or Swing UI.  In the pagination package, this model is called PagingActions.  The class has some methods to get record counts and current index, based on the collection that was returned from Jargon.  Currently, there is a utility class in that package called PagingAnalyzer, which takes a List of IRODSDomainObject and returns the PagingActions model.

The model has a set of 'index' entries that are generated by looking at what is represented by the List of IRODSDomainObject that was returned by Jargon.  It will typically have "<<", "<", ">", and ">>" (first, previous, next, last), based on the position of the given List in terms of all available data.  It will also have a set of direct indexes, representing the 10 or so pages around the current index, for jumping to a page.

In this example, from iDrop web, I am listing collections under a subdirectory and building the PagingActions for the current collection (from BrowseController, a Grails controller):

def entries = collectionAndDataObjectListAndSearchAO.listDataObjectsAndCollectionsUnderPath(absPath)

int pageSize = irodsAccessObjectFactory.jargonProperties.maxFilesAndDirsQueryMax

PagingActions pagingActions = PagingAnalyser.buildPagingActionsFromListOfIRODSDomainObjects(entries, pageSize)


As you can see, the List of entries can be used to create the pagingActions model.  This is placed into the model for rendering by a page view.  Right now I'm just displaying, and need to add the actions when each link is clicked.  The grails view template looks like this:

<div id="browseDetailsToolbar" ><div class="pagination">  <ul>  <g:each in="${pagingActions.pagingIndexEntries}">   <li><a href="#">${it.representation}</a></li>  </g:each>
  </ul>
</div></div>


Where the representation is what we want to display on the link.  The PagingActions model is smart enough to know whether the various 'first', 'previous', 'next', and other buttons should be included.  In rendering, it looks like this:





Hopefully you can see the paging controls above the center panel. What will be left is adding some links and JavaScript to take the index information in each paging index item, and use that as an offset for the next Jargon method call.

Jargon will take care of closing result sets for you!

A note that I've gone back and forth on this, but decided that the 'count' of each item, which is part of IRODSDomainObject and therefore available in each item in the collection, is a count, not an index, it's 1 based, not 0 based.  This means you can take the count of the last item as the offset for the next query

By the same token, each 'page' of the index items is 1 based.  The first page is 1, not 0.  This can be confusing, just remember that the last item in each Jargon collection is the count, and the last item, as an offset, will return the item after it when using it as the partialStartIndex of a Jargon method.

I'm still working on this, and need to try it in Swing as well.  I'm trying to include support for either 'button' style interaction, or to create a model for a slider control, so this is all subject to change.


No comments:

Post a Comment