The Chandler Query System
Overview
The Chandler Repository provides a mechanism for performing queries over the contents of the repository. These queries are declarative: you specify a set of conditions that you want Items to satisfy, and the query processor takes care of retrieving the relevant items. Queries are specified using strings, and the syntax of these strings is inspired by XQuery.
An unusual feature of the query system is the ability to have a query subscribe to changes in the repository. When items are changed, the repository is able to notify the query of that change. The query processor determines whether that change would cause the item to become part of the result of an existing query, or whether the change would cause the item to no longer be part of a query result set. This allows queries to keep their result set up to date automatically.
If you haven't already read the document, "The Busy Developer's Guide to the Chandler repository", you should do so before continuing. Throughout this document we'll use the following terms:
- query
- A general term denoting a set of boolean conditions which in turn specify a set of Items. Also a specific Python class that provides a programmer with an API for manipulating a query.
- query string
- A Python string that contains a textual representation of the query
- query result set
- A set of Chandler Items that satisfy the boolean conditions for a query
Python API
The first thing that you'll need to know when working with queries is how to create them. If you are working within the Chandler Parcel system, then you will be working with queries via parcel.xml when you deal with ItemCollections. When working with queries via parcel.xml, you will only need to know about the string syntax for queries, which is described in the next section. Here's an example of parcel.xml usage, the ItemCollection for Chandler's task list:
<contentModel:ItemCollection itsName="taskItemCollection">
<displayName>TaskList</displayName>
<_rule value="for i in '//parcels/osaf/contentmodel/tasks/TaskMixin' where True"/>
</contentModel:ItemCollection>
You just supply a query string as the value of the _rule attribute.
To use a query from Python, your code will look something like this
import repository.query.Query as Query
p = rep.findPath('//Queries')
k = rep.findPath('//Schema/Core/Query')
q = Query.Query("for i in '//Schema/Core/Kind' where True", p, k)
for i in q:
print i
Queries are Chandler Items, which means that they are stored in the repository themselves. The main query class is in repository.query.Query. To instantiate this class, you supply a query string, a path to the query's parent in the repository, and the Kind for a Query (these last two arguments are for the Item constructor). The query in the example specifies all items of type //Schema/Core/Kind.
Once you have an instance of Query, you can ask for the result set of the query. You have two choices for how to do this. You can ask for the 'resultSet' attribute of the Query item, which will return a reference collection containing the items in the result set. You can also call the __iter__ method on the Query Item by iterating over the Item (as shown in the example). This gives you a Python generator for the query results, which can be more efficient thatn asking for the resultSet attribute.
You can update the value of the query string by changing the value of the attribute queryString:
q.queryString = "for i in '//parcels/osaf/contentmodel/contacts/Contact' where True"
The query string may be the empty string, "". In this case, the result set of the query is empty.
Parameters allow you to use values from your Python code in a query. Parameter names are strings that begin with "$". To use parameters, you should set the query's args attribute to a dict containing the arguments.
This code shows how to pass reference collection as a parameter to a query. The parameter name is $name. The value of $name is a tuple containing the UUID of the Chandler item that has a reference collection attribute and the name of the reference collection attribute (a string).
q.args = {}
q.args["$name"] = (item.itsUUID, attributeName)
This code shows how to pass an ordinary value -- this allows you to make comparisons to data in your Python code (indicated by the variable name in this case). The name of the parameter is $myname and it's value is a list containing the value of the variable name from your Python code.
q.args["$myname1"] = [ name ]
You can view the entire API for the Query class online.
Query String Syntax
There are several kinds of queries. Simple queries are queries over sets of items. You can write a simple query using the for statement. Compound queries are queries composed from other queries.
Keywords and required tokens in a query are shown in bold. The portions of the query that you supply are written in italics
for queries
The most basic query is a for query. The syntax of a for query is:
for var in|inevery set where boolean-condition
The result set of a for query is the set of items in set which satisfy the boolean-condition
Here's what you need to supply for the various portions of a for query:
var is the iteration variable for the query. For now, you must use 'i'. This constraint will be removed in the future.
If your for query is iterating over Kinds, and you want to include items of a Kind's subkind, you should use the keyword inevery instead of in.
set specifies a set of Items to apply the boolean-condition to.
This allows you to issue a query over a subset of the repository. At the moment you can supply one of three possibilities for set:
- repository paths for Kind names
These paths must enclosed in either single quotes(') or double quotes(").
The result of the query below is the set of all Contact items whose contactName child item's firstName attribute contains the letter 'a'. It shows the use of a Kind name as the source. In the where clause, we see the use of the iteration variable i, as well as the names of attributes (contactName, firstName). This example also shows the use of the contains function.
q.queryString = """for i in '//parcels/osaf/contentmodel/contacts/Contact'
where contains(i.contactName.firstName,'a')"""
parameters
A parameter is a string which begins with $. You can pass a ref-collection as a parameter to do a query over a ref-collection. See "The Busy Developer's Guide to the Chandler repository" for more on ref-collections
This query illustrates the use of a parameter, $1 as the source set.
q.queryString = """for i in $1 where contains(i.itsName,"arc")"""
ftcontains(lucene-query , attr-name1 , ... , attr-namen)
If you specify ftcontains, the source set is the result of a full text query using the Lucene query specified by the string lucene-query. This result contains a set of Items where the search text appears in any attribute whose name is listed in attr-name1, ..., attr-namen. The list of attributes names is optional (all attributes will be searched in this case).
This query demonstrates the use of a full text query for the source set. The lucene query is "femme AND homme" which returns all items containing the text "femme" and the text "homme". Since we've provided an attribute argument, synopsis, to ftcontains, the source set will be only those items whose synopsis attribute contains the text "femme" and the text "homme". The query's where clause shows the use of the len function to further limit the items that will be in the query result.
q.queryString = """for i in ftcontains('femme AND homme','synopsis')
where len(i.title) < 10"""
The boolean-condition is an expression which can refer to the iteration variable and parameter values.
Here are the elements that you can use in the boolean condition (A BNF grammar for query strings appears at the end of this document).
The iteration variable for the query
At the moment, just the variable i. In the future more variable names will be allowed.
The names of Chandler item attributes
You may use the attributes of an Item. For example, most Items have an itsName attribute, so
i.itsName
will give the value of itsName for the current value of i. You can also call methods on Items (since method names are attributes)
Literal values
- numeric literals
- You can use any integer
- string literals
- String literals must be enclosed in either single (') or double (") quotes.
- boolean literals
- The Python boolean literals
True and False
Parameters
This query shows that you can use parameters (like $0) in the where clause as well, allowing you to use run time values from your Python program inside a query.
q.queryString="""for i in "//Schema/Core/Kind"
where contains(i.itsName,$0)"""
Unary (prefix) operators
- + expr
- Make numeric expression expr positive
- - expr
- Make numeric expression expr negative
- not expr
- Negate boolean expression expr
Boolean operators
- expr1 and expr2
- Perform the logical "AND" of expr1 and expr2
expr1 or expr2
- Perform the logical "OR" of expr1 and expr2
not expr
- Negate expr
Relational operators
- expr1 >= expr2
- Return true if the numeric/date expression expr1 is greater than or equal to the numeric expression/date expr2
- expr1 <= expr2
- Return true if the numeric/date expression expr1 is less than or equal to the numeric/date expression expr2
- expr1 > expr2
- Return true if the numeric/date expression expr1 is greater than the numeric/date expression expr2
- expr1 < expr2
- Return true if the numeric/date expression expr1 is less than the numeric/date expression expr2
- expr1 == expr2
- Return true if expr1 and expr2 are equal according to the equality rules for their Kinds
- expr1 != expr2
- Return true if expr1 and expr2 are not equal according to the equality rules for their Kinds
Arithmetic operators
- expr1 + expr2
- Add the numeric expressions expr1 and expr2
- expr1 - expr2
- Subract the numeric expression expr2 from the numeric expression expr1
- expr1 * expr2
- Multiply the numeric expression expr1 by the numeric expression expr2
- expr1 / expr2
- Divide the numeric expression expr1 by the numeric expression expr2
- expr1 div expr2
- Produce the result of integer dividing the numeric expression expr1 by the numeric expression expr2
- expr1 mod expr2
- Produce the remainder of dividing the numeric expression expr1 by the numeric expression expr2
Dates
You can supply dates and times in eGenix mxDateTime ISO format like this:
date(ISO-date-string)
Construct a date instance that represents ISO-date-string
This example shows how to use dates in a query. Note the use of the date constructor to create a date literal which is then compared to the startTime attribute of i (i will be a CalendarEvent)
q.queryString="""for i in '//parcels/osaf/contentmodel/calendar/CalendarEvent'
where i.startTime > date('2004-09-31 12:34:56')"""
Functions
At the moment there are only three functions that you can call from queries. We will be expanding this set of functions as the system develops.
- len(expr)
- Return the length of expr. expr can be a string or a list Kind
- contains(string, substring)
- Return true if string contains substring
- hasAttribute(string)
- A method on Chandler Items that returns True if the Item has an attribute named string
Union Queries
- union(query1,query2, ... , queryn)
- Compute the union of query1..queryn and return that as the result. Any item that appears in the result set of any of the queries will appear in the result set of the union.
This query is composed of three for queries that show the same pattern. They all use a Kind path as the source set, and use True as the where clause, to indicate all items of a particular kind. The union operator simply creates the union of the three for queries.
q.queryString="""union(for i in "//parcels/osaf/contentmodel/calendar/CalendarEvent" where True,
for i in "//parcels/osaf/contentmodel/Note" where True,
for i in "//parcels/osaf/contentmodel/contacts/Contact" where True)"""
Intersection Queries
- intersect(query1,query2)
- Compute the intersection of query1 and query2 and return that as the result. Items in the result set of the intersection must appear in the result set of both query1 and query2.
This query computes the intersection of those Kind items whose name contains 'o' and those Kind items whose name contains 't'
q.queryString="""intersect(for i in '//Schema/Core/Kind' where contains(i.itsName,'o'),
for i in '//Schema/Core/Kind' where contains(i.itsName,'t'))"""
Difference Queries
- difference(query1,query2)
- Compute the difference of query1 and query2 and return that as the result. Items in the result set of the difference consist of any Item that is in the result set of query1 which is not in the result set of query2. You can think of this as starting with the result set of query1 and removing any Item which appears in the result set of query2.
The result of this query is those Kind items whose names contain 'o' and do not contain 't'.
q.queryString="""difference(for i in '//Schema/Core/Kind' where contains(i.itsName,'o'),
for i in '//Schema/Core/Kind' where contains(i.itsName,'t'))"""
Query Notification
A key feature of Chandler queries is change notification. A Chandler query defines a set of items. The notification mechanism makes sure that the result set of the query always contains the correct Items. If you change an item so that it satisfies the conditions of some query, the notification mechanism will add that item to the result set of the query. If you change an item so that it no longer satisifies the conditions of a query then the notification mechanism will remove that item from the result set of that query. The query notification mechanism does not indicate that any attribute of any item in a query's result set has changed. It just keeps the right items in the result set.
Clients of a query can request that they be notified when the query notification mechanism notices items that enter or exit the query result set. Your client code supplies a Chandler Item which has a callback method. In the Chandler application, this item will usually be an ItemCollection.
The callback method will be passed a tuple containing two lists: a list of the UUID's of any items added to the query result and a list of the UUID's of any items removed from the query result.
Your code makes a request for notification by calling the subscribe method on the Query Item. This method has two mandatory parameters and two optional parameters. The first parameter is an Item that has the required callback item, and the second parameter is that name of that callback method. The two optional parameters are a little more difficult to explain. The repository's concurrency model gives each thread a separate view of the items in the repository. You can select when you would like to be notified of changes. The options are:
- be notified of changes that happen in your view (the same view that the query is being run in) - instantaneously
- be notified of changes that happen on views outside your own - when your view commits
- be notified of both kinds of changes
The default is to be notified of both kinds of changes. The optional parameters are Booleans that you can set to False if you don't want to be notified if changes in your view (the first optional parameter) or of changes in other view (the second optional parameter)
No matter how you set the view notification parameters, you will only be notified of changes that would add or remove items from the query result set.
- q.subscribe(item, methodName, inSameView, inOtherViews)
- call methodName on item when changed items enter or leave the query result set. If inSameView, is true, the callback will be called as soon as the item is changed, if the item is in the same repository view as the Query. If inOtherViews is true, the callback will be called when commit is called
At the appropriate moment, the query system will call all subscribed methods. These methods might look like this:
def handle(self, changes):
added, removed = changes
print added, removed # simple action
The changes argument to the callback method is a tuple containing two lists. The first is a list of all the items which were added to the query result set. The second is a list of all the items which were removed from the query result set.
Future plans
We are planning a number of improvements to the query system:
- Performance enhancements
- There are a number of ways to improve the performance of queries in Chandler. This work will be ongoing over the next several releases.
- Debugging tools
- In a future version of Chandler, we will provide an interactive means for testing queries. This is not to be confused with a general end user query facility, which is also planned for a future version of Chandler.
We want to update and improve this document
Please send any comments to dev@osafoundation.org.
Appendix 1: Grammar for Queries
Non Terminals in plain
Terminals in bold
NUM: '[0-9]+'
STRING: '"([^\\"]+|\\\\.)*"|\'([^\']+|\\\\.)*\''
PARAM: '\$[0-9]+'
UNOP: '(not|\+|-)'
MULOP: '(\*|/|div|mod)'
ADDOP: '(\+|-)'
RELOP: '(==|!=|>=|<=|>|<)'
BOOLOP: '(and|or)'
ID: '[a-zA-Z]+'
END: '$'
stmt: union_stmt END
| intersection_stmt END
| difference_stmt END
| for_stmt END
stmt_list: stmt (, stmt)*
union_stmt: union (stmt_list)
intersection_stmt: intersect (stmt , stmt)
difference_stmt: difference (stmt , stmt)
for_stmt: for ID in | inevery (name_expr where and_or_expr END
| STRING where and_or_expr END
| stmt where and_or_expr ) END
and_or_expr: rel_expr [ BOOLOP rel_expr ]
rel_expr: add_expr [ RELOP add_expr ]
add_expr: mul_expr [ ADDOP mul_expr ]
mul_expr: unary_expr [ MULOP unary_expr ]
unary_expr: [ UNOP ] value_expr
value_expr: constant
| PARAM
| ID [ ( [ arg_list ] )
| (. ID )+ [ ( [ arg_list ] ) ]
]
constant: STRING | NUM
arg_list: and_or_expr ( , and_or_expr )*
str_list: STRING ( , STRING )*
name_expr: ID | PARAM
| ftcontains ( str_list )
$Revision: 1.4 $
$Date: 2005/03/15 23:00:05 $
$Author: twl $
$Log: chandler-query-system.html,v $
Revision 1.4 2005/03/15 23:00:05 twl
Update query docs for 0.5
Revision 1.3 2004/10/21 22:30:39 twl
Commit branched docs to trunk
Revision 1.1.2.2 2004/10/19 20:03:02 twl
Incorporate Ducky's feedback
Revision 1.1.2.1 2004/10/18 21:27:03 twl
Committing doc changes to branch
Revision 1.2 2004/10/15 18:31:04 twl
Bugs 2112, 2113 (Doc bugs)
Incorporate review feedback
Revision 1.1 2004/10/12 20:08:20 twl
Fix bug 2112 - 0.4 Query documentation update