 |
JRS at the IBM Information on Demand Conference
I'll be co-presenting a session at the IOD conference with Edison Ting (one of the DB2 pureXML architects) about how we used DB2 9.x XQuery support in JRS. The original JRS incubator relied on XQuery to allow clients to write complex queries over the RDF/XML that describes our indexed properties (see the Open Service for Lifecycle Collaboration specs on Properties, Indexing and Query).
While this has definitely worked, the fact that the Open Services for Lifecycle Collaboration sample server uses SPARQL as it's query language means we get a lot of questions about our choice of XQuery. One answer is that it would require us to develop a lot of code to implement a SPARQL based solution for the Jazz Foundation Services going forward, the major advantage to using XQuery in DB2 was that all the smart folks in Almaden have done the hard work already.
So for all the fun details, see you in Vegas.
Categories
: [ jrs | xquery ]
Sep 06 2008, 08:56:42 PM EDT
Permalink
|
JRS has graduated!
While JRS may still be listed on jazz.net as an incubator project internally we've reorganized and refocused. The team has been reorganized into the Jazz platform team and has become the core of the "Jazz Foundation Services" development. JRS as an implementation has also moved on, the new Jazz Foundation Services will be based on the JRS but not before we complete some serious refactoring. So one thing we're trying hard to do is differentiate between JRS as the incubator project (and the server which will be shipped with the Rational Requirements Composer) and the Jazz Foundation Services as the common platform moving forward.
In terms of focus the move has also required us to look at how the incubator was structured and those areas where we replicated function that had been developed for the Jazz Team Server. In a few cases we'd had to either branch and enhance a Jazz Team Server function or we had developed our own in parallel (as in the case of our Lucene full-text indexing and search). We're in the process now of merging the changes we made in the Jazz Foundation Services platform and in the case of the Lucene services we will replace the Jazz Team Server code with that from the incubator.
Beyond the tactical measures and refactoring we are planning on opening up the server infrastructure itself, for example to allow storage, indexing and query components to be separate, co-operating, servers rather than all in a single server process. This, we believe, will allow more interesting topologies, scalability and distrubution characteristics, but impacts a lot of the current design both of the JRS incubator but also of the current Jazz Team Server. So, just looking at the to-do and to-think-about list for the Jazz Foundation Services we have:
- A common local/remote caching infrastructure (see work item).
- The ability to store and execute scripts in the Foundation Storage Services (see work item).
- Distribution of Lucene indexes (a scalability feature of the Rational Asset Manager).
- Separation of indexing and query services into a stand-alone and shared server.
So, it seems like we should have plenty to keep us busy for some time to come.
Categories
: [ jazz | jrs ]
Sep 06 2008, 08:41:51 PM EDT
Permalink
|
Open Services for Lifecycle Collaboration
Today Danny Sabbah announced the Open Services for Lifecycle Collaboration initiative in his RSDC keynote. There is, as noted in the FAQ, a direct relationship between the Open Services work and JRS. JRS has in some regards been the proving ground and many of the documents released today were JRS specifications that have been released to the public - though the JRS specs needed quite a bit of editing to make them look quite so nice. Also there are some code samples, the client-side examples are taken from the JRS project whereas the server has quite a history on this blog. The code that is being distributed is the experimental REST server we developed 18months or more ago in Python, the pre-cursor to JRS and now it comes back to life as the Open Source sample implementation of the Open Services specifications.
Note also that this server is the experiment I mentioned on Sunday where we have implemented the query service using SPARQL.
Categories
: [ jazz | jrs | python ]
Jun 01 2008, 02:12:10 PM EDT
Permalink
|
SPARQL and RDFLib
Well, I've posted here about JRS and XQuery which has proven to be very interesting as a query language over our RDF indexes. Today I'd like to discuss our prototype using SPARQL. Basically we built a Python server and used the very cool RDFLib implementation to allow us to store the same indexes and experiment with SPARQL queries over them. Here's a few of our initial (and non-scientific) results
- The queries seemed easier to write, though the jury is out on which language is easier to read.
- Mapping the JRS URL-encoded query to SPARQL was far more straightforward than the equivalent translation to XQuery.
- Obviously the fact that the query language and data share a common data model makes the query more logical.
Obviously we didn't compare performance, but one thing is clear while the major database vendors are scrambling over themselves to support XML there's no one talking openly about SPARQL. This is definitely a shame, because it's clear from just the experiments we have done that there are a class of queries that actually are easier to express in SPARQL than XQuery.
Look for a posting on Monday concerning the experiment :-).
Categories
: [ jazz | python | rdf ]
May 28 2008, 01:45:14 PM EDT
Permalink
|
JRS at RSDC 2008
After some discussion we've decided that JRS will be represented at RSDC this year, specifically we'll have our own pedestal in the solution center. So if you want to find out more about the fun we've been having, about the REST services we've been putting together and our RDF and XQuery plans stop by.
Updated: also look for us at the Jazz live event on Tuesday evening.
Categories
: [ jazz | jrs | rsdc ]
May 28 2008, 01:44:20 PM EDT
Permalink
|
RDF - the good, the bad or the ugly?
I've mentioned before here that one great feature of the JRS server is it's indexing ability, in some ways it's akin to what Lucene does for text search in that there are a set of format-specific components that are able to extract properties from resources and a store into which these are put and made available for query. The difference between JRS and Lucene[*] is that we are trying to extract structured properties that can be made available via a more traditional query language - think XQuery and you'll have anticipated a future post. Some of these components have a fixed set of things to extract, for example we have an EXIF indexer that pulls a pre-defined set of properties from image files. Some however are configurable, so our XML indexer has a declarative specification that tells the server which parts of your resources the server should index. So what has this got to do with RDF?
Having indexed properties of resources we need not only to make them available for query, but we also want a query to be able to return some of these properties as well. We also support the ability to ask for all the properties that have been indexed for a resource by appending "?properties" to the URL of a resource. We wanted a format that would be able to encode these indexed properties in a regular way, and if possible pick up a standard one. We chose RDF as a standard, but also because our internal indexing components and storage have been inspired by RDF in a number of ways already. So, when you do a GET on {resource-uri}?properties you get a nice RDF document back, something like this:
<rdf:Description
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://dublincore.org/documents/dcmi-terms/"
xmlns:ns="http://example.com/xmlns/music#"
rdf:about="/jazz/resources/musicdb/albums/album-1">
<dc:contributor rdf:resource="/jazz/users/zoe"/>
<dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2008-03-02T00:00:00
</dc:modified>
<dc:format>application/xml</dc:format>
<rdf:type rdf:resource="http://example.com/xmlns/music#album"/>
<ns:name>A Matter of Life and Death</ns:name>
<ns:releasedYear rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">
2006
</ns:releasedYear>
<ns:artist rdf:resource="/jazz/resources/musicdb/artists/artist-1"/>
<ns:genre>Rock</ns:genre>
<ns:genre>Heavy Metal</ns:genre>
</rdf:Description>
Where the properties in italics are system properties - JRS maintained properties for all resource types, and the rest have been extracted from an XML resource using the declarative indexer, so you can imagine that we extracted only a subset of element values from the source resource.
The point of the post is that RDF is a great format for this, the body of work and literature on RDF allows us to think about what these property documents mean, especially in the presence of many linked resources. So for us the use of RDF in JRS has been really good for the server team, and we had thought that the client teams would think so too. Well the first thing that came up was the <rdf:RDF> tag that we had originally put around all the description documents, this seemed like unnecessary overhead (the ugly) and so having re-read the RDF spec and found that it was optional anyway we took it out. The next reaction was not one we had expected, one development manager here asked whether this meant all his developers now have to become RDF experts. Well, we had not anticipated that, we had used RDF as first and foremost an XML dialect and as such we internally and in our samples manipulated it only with XQuery and XPath. So no, we expect most developers do not need to know RDF, they just see a particular XML form with some particular idioms (we could even use another prefix than "rdf" and I wonder how many people would even recognize that it is RDF?).
The interesting opportunity is when you DO treat this body of index data as RDF, for example XQuery is great, but like SQL it really falls down trying to navigate a graph of resources unless the query has fixed the relationships ahead of time (think nested queries). SPARQL, as an RDF query really excels at this kind of query but falls down in other ways. We expect that many of the first JRS applications will use only XQuery and XPath, but over time we expect that some application requirements will be much better expressed in SPARQL which we hope to provide in JRS at some future point.
I am interested that there seems to be a bad impression of RDF in the general developer community, personally before working on JRS I had had no practical experience of RDF but it does fit the need we had. I wonder if there is an assumption that to consume RDF you must also consume RDFS, OWL, think in Ontologies and so forth? Perhaps the hype around the semantic web has put some people off? Whatever the reason, the comment we heard on RDF in our meeting seems to be common, and we've learned to smile sweetly and spend time explaining to many people why this is good for them (there are some people we just tell to suck it up - but that's a different matter).
Good? Yes, for us, in this case it is. Bad? No, but it does take some selling. Ugly? Not really, actually when carefully used someone commented that "you really couldn't have made it any simpler or clearer if you tried".
* - JRS does use Lucene as well, so we support both structured query as well as full-text search.
Categories
: [ RDF | XML | jazz ]
Apr 09 2008, 09:23:57 PM EDT
Permalink
|
New entry on the Jazz blog
James Branigan, one of my fellow JRS developers has a nice article on the Jazz blog about the journey taken by the Jazz team and which brings us up to date with JRS.
A brief history of the Jazz Team Server interface: Our journey from a J2EE server towards a RESTful server
Oh, and for those in the know the Python server mentioned in James posting is the project I have mentioned here a few times, it used Python and Django to build a fairly complete RESTful content repository and in fact the documentation from that project became the initial specs for JRS. We never intended for that server to see the light of day outside the lab, it became affectionately (I like to think) known as the "Rinky-Dink" server.
Categories
: [ jazz | jrs | rsdc ]
Feb 20 2008, 08:29:11 AM EST
Permalink
|
JRS and JCR (JSR-170) again
David Nuescheler Said (here):
Hi Simon,
Thanks a lot for this post. I am very interested in the future development of JRS and as the Spec-Lead of JCR (aka. JSR-170 & JSR-283) I would like to ask for some more detailed clarifications around some of the above statements.
(1) Flat, Hierarchical & non-contiguous
JCR in my mind is not limited to exposing hierarchical only structures.
As a matter of fact I think of flat as a very simple hierarchy to begin with.
In a content repository is certainly acceptable that there is no accessible (readable) direct parent node of an item (be that for access control reason
reasons). In my mind the URL space itself is hierarchical
though its path segments as defined in http://gbiv.com/protocols/uri/rfc/rfc3986.html#path
(2) Personally, I believe that the tie into "Java" is more of tie into the
JCP as a standardizing body and not so much about the tie into the "Java language". Maybe the discussions around APP & JCR
http://dev.day.com/microsling/content/blogs/main/jcr-loves-atom.html
and a couple of comments around some of the non java languages
that use JCR may be interesting:
http://dev.day.com/microsling/content/blogs/main/fudbusting2.html
(3) I really think that JCR allows both for typed and untyped nodes (we call them structured vs. unstructured) as a matter of fact I am big proponent of a "Data First" architecture and find this feature in JCR one of the most fascinating features.
Anyhow, thanks a lot for your excellent post and I would be happy to engage in a more detailed discussion, feel free to contact me at any point.
David, I would be interested in the discussion, I think the points you make are valid and interesting. In terms of hierarchy it's important to JRS to allow clients to be able to PUT a resource to /a/b/c/mydoc.txt without having to have created a node at /a, /a/b, /a/b/c first which is how I meant to use the term non-contiguous in this context. Of course we also allow /a to be an Atom feed, so you can POST a child feed to /a/b and then another to /a/b/c and finally POST mydoc.txt, providing a completely hierarchical space - or you can mix both styles.
I understand the point about JCP, however we wanted to ensure that we didn't "pollute" our design with any assumptions and so we worked only from the HTTP and APP specs and developed test clients in Java, Python, BASH scripts, etc (see "J" is for Jazz, not Java). As you note in the fudbusting link the JCP API still needs to be implemented in different languages and that there are difficulties with type conversions and so forth. Your third integration approach - that of accessing the content repository through a RESTful interface is exactly the approach taken by JRS and allows maximum freedom in client choice.
I must be showing my ignorance of JCR (my apologies), again I mean that we do not have the notion of resource type in the JRS repository apart from one specific case - Atom feed. On a brand new server I can PUT MS Word documents, code, audio, video, any resource that exists on my machine in fact, into the server without having to define any types ahead of time. All resources are stored simply as they come in, with their content-type and other associated meta-data without intervention or assumption by JRS. I mentioned Atom feeds, the way you create a collection in JRS is simply to PUT (or POST to an existing collection) a valid Atom feed document with the content-type "application/atom+xml" and the server does assume that you want to create a collection at that point - it therefore creates additional structures server-side to allow for clients to now POST to the feed. There is the ability for clients to teach the server about interesting things in XML resources for our indexer, but I'll hold that thought for a posting of it's own.
I hope that makes things a little clearer, and I hope I didn't completely dis the JCR spec :-)
Categories
: [ Java | jazz | rest ]
Feb 20 2008, 08:23:19 AM EST
Permalink
|
JRS server on Mac OS
As I mentioned in a previous post I am now doing all of my JRS development on a MacBook Pro, using the Rational Team Concert beta 2 client. I hadn't given any thought to the stand-alone server until the work item [43138] appeared. I always run the server from within Team Concert as I am working, not from the command line. Anway, a little shell script hacking later and at least the JRS server.startup and server.shutdown scripts now work on Mac OS for all the other Mac developers out there.
Categories
: [ jazz | mac ]
Feb 04 2008, 01:21:27 PM EST
Permalink
|
Jazz is for everyone
Looks like Jazz is now for everyone (well not that progressive modern stuff, but who doesn't like Louis Armstrong?). Seriously the Jazz web site is now accepting registration from anyone, lifting the restriction that you be an IBM Rationl partner or customer.
Everyone is now welcome to join Jazz.net. A special thank-you to all our Rational customers and partners, the university researchers and students, and everyone else who was part of the Jazz.net early pilot program.
Looking for JRS? Currently our builds aren't in the build list, but you can see our work items on the site.
Categories
: [ jazz ]
Jan 19 2008, 10:03:43 PM EST
Permalink
|
"J" is for Jazz, not Java
In a previous post on JCR I mentioned that JRS had consciously avoided the development of a client-side Java API. In fact there is no requirement for application clients to be developed in Java at all. One of the concerns we saw for previous Rational products was the complexity of the API and it's proprietary nature which made interoperability, integration and extension an expensive and complex proposition.
To ensure we remained honest through development we wrote the majority of our test suite to only use the JDK to open connections to the server and parse XML. We didn't allow the test suite to use any of the server code, any client library we developed ourself or in fact any stack other than JDK plus some helper classes. The first round of test cases were actually re-written from a Python test suite developed for an early experimental version of the server. Python has continued to play a part, we are delivering a set of samples and so far are using bash scripts with cURL, Python and JavaScript.
For Java client applications we really don't expect the use of the JDK and have tested with both Apache HttpClient and Abdera (for feed/entry creation and parsing). These seem to be the preferred libraries the application teams want to, and probably should, use.
So at least for us in JRS, if not for the rest of IBM, "J" stands for the Jazz Project and not Java.
Categories
: [ jazz | python | rest ]
Jan 17 2008, 11:38:16 AM EST
Permalink
|
JRS and JCR (JSR-170)
In an email response to
Jazz REST Services,
Bill De hOra asked about the relationship between JRS and JRS-170 (Java Content Repository, JCR). He noted the following language in the IBM description of JCR:
"Every node has one and only one primary node type. A primary node type
defines the characteristics of the node, such as the properties and
child nodes that the node is allowed to have. In addition to the primary
node type, a node may also have one or more mixin types. A mixin type
acts a lot like a decorator, providing extra characteristics to a node.
A JCR implementation, in particular, can provide three predefined mixin
types..."
http://www.ibm.com/developerworks/java/library/j-jcr/
Well, it seems to me there are a few specific areas of difference between JRS and JSR-170.
Firstly the strictly hierarchical model for JSR-170 is interesting but we decided not to be so restrictive, and simply to allow the URL-space to be open for users to choose naming schemes, either hierarchical, flat or non-contiguous. We did have some pressure from initial consumers to make everything completely Atom based so that you had to build a hierarchy of folders. When this led to having to create intermediate nodes we went back to a non-contiguous scheme where the client application chooses the storage scheme most appropriate to them.
Secondly a driving principle for the work was to ensure the repository itself was as open as possible, to not have any client language or platform assumptions and so have no Java client-side API - everything is documented in terms of the HTTP/APP operations. This again is a departure for us, not only do we tend to assume we're building Java clients and Java APIs but we advantage Java to the point of making it impossible in some cases to use anything else.
The last major difference with the JSR is the fact that the nodes are "typed", which we decided to avoid in terms of having the server know about the resources (except for the distinction between "simple" resource and feed, analogous to the JSR "unstructured" and "folder"). We also decided that properties should not be attached to a resource as in webDAV but we would extract properties from resources which is where the indexers come in. An indexer is a client written description of how to extract properties from an XML resource (using XPath) so that specific property forms can be indexed in an efficient way.
Categories
: [ jazz ]
Jan 13 2008, 08:03:57 AM EST
Permalink
|
Develop with Jazz, for Jazz, and on a Mac
Well, the last few months have been very busy and really fun - am writing code for real! I have been seconded to work on the new Jazz REST Services (JRS) project**. JRS is a technology incubator project as part of the The Jazz Project and provides a RESTful, resource-neutral store which I'll talk about in subsequent posts.
This post then is about using Jazz, rather than developing for, which has been a really positive experience. I've used a whole bunch of source control and configuration management systems over the years, RCS, PVCS, PCMS, CVS, SVN, ClearCase and ClearCase/ClearQuest UCM. They seem to fall into one of two broad categories, file based or work-item based, that is they either deal in checking in/out files and folders or they track work against work items and you commit the item to check-in all the associated change sets. PCMS (way-back when) was work item based, UCM is and now Jazz is as well; however, the level of integration and ease of use in Jazz is really a huge leap forward from any of those.
The workflow, creating a defect/task making changes and associating them to the item is as easy as you think it should be and then the collaboration features to share changes in-flight with team members, request validation of work and so on have been simple enough to use that even a small team like ours has used daily. If anyone has seen any of the demos of Jazz so far you'll have seen Eclipse and Java, lots of Java :-) Well I can say that this is pretty much the out-of-the-box configuration, however it works just as well with PyDev and our Python test client projects.
So, to the last part of the title, yep all my Jazz dev is done on my nice shiny new MacBook Pro. The Jazz client is always provided in a Mac OS X package and has worked perfectly all the way through the project. And, of course, the screen envy from my ThinkPad using colleagues is always nice.
** the link will, at least for now require sign on but that should be removed in the next week or so.
Categories
: [ jazz ]
Jan 11 2008, 02:11:59 PM EST
Permalink
|
Jazz REST Services
So, I promised a post on the Jazz REST Services work, and here it is. So first of all what is JRS?
JRS implements a RESTful repository following the architecture and style of the web, the repository is resource neutral, you don't have to pre-define resource types and you certainly don't have to have the repository understand them ahead of use. This is a departure from the way we tend to build tools today where both client and server knows the set and types of "things" during development and this set is not user extensible. This leads to all sorts of difficult issues in tool integration both for us as well as customers and partners. We also have problems in extending these tool "models" or resource types as we tend to develop the models with a closed-world assumption thinking that we can analyze the problem entirely and produce one single model for a given tool domain. Well, we can all imagine how well that works out, and some of us have to live the consequences.
So our proposition is that rather than producing large, monolithic models and closed tools we develop with a much more fine-grained approach and move from a file-system approach to a repository approach. It is also important that the repository should need to know as little as possible about the resource types, this also means that the usual approach of moving resource-specific operations to the server should be discouraged as well. The upshot is a server that we're already building some interesting sample projects and product prototypes upon, with a very cool set of features:
- We leverage the Jazz notion of a project/repository to allow a server to provide more than one store with it's own set of users and security.
- We expose the list of projects as well as users, roles and so on using Atom Publishing, so for example adding a new user to the server is a POST to the /jazz/users collection.
- When running in secure mode the server responds on HTTP/S and supports Basic and Digest authentication.
- Access to resources uses a role-based authentication mechanism.
- Resources can be PUT anywhere in the URL space /jazz/resources/{project}/ and they behave in a completely RESTful way.
- We support GET, HEAD, PUT, DELETE for all simple resources and POST for collections.
- We implement conditional operations with both ETag and Last-Modified provided wherever possible.
- All resources are versioned, every PUT creates a new revision.
- When you update a resource the response includes a Content-Location which provides the version-specific URL.
- You can retrieve the list of versions for a resource by doing a GET with the query parameter ?revisions.
- If you PUT a resource and it's content type is application/atom+xml;type=feed and the resource contains a valid feed document then a collection will be created instead of a "simple" resource.
- So now you can POST to the collection, do a GET to retrieve collection contents, all as expected.
- Collections allow posting entries and media resources.
- To support query we provide a set of indexers that either extract text from resources to insert into a Lucene back-end or they extract sets of triples (subject, predicate, object) that represent index properties of the resource
- Plain text, HTML text and XML text indexers for Lucene,
- A JSON indexer to allow queries over JSON resources,
- An image indexer that extracts EXIF tags,
- An XML indexer that uses custom rules to extract values from XML resources.
- In terms of query we differentiate three cases:
- Search - full-text search using the Lucene back-end.
- Query - a structured query supporting multiple property queries on both system and custom index properties.
- Properties - a simple API to ask for the indexed properties stored on a particular resource - use the "?properties" query parameter.
Wondering about the duck? You'll see this fellow a lot on our pages, he's an Indian Running Duck (image courtesy of the Indian Running Duck Association) and the mascot for JRS. Why a duck? Well the three of us in Raleigh who formed the core of the development team got to know each other pretty well locked up in a series of secret hideouts around the IBM campus. Two of us have flocks of runners, and chickens, but you simply can't use a chicken as a mascot can you!
Categories
: [ jazz | rest ]
Jan 11 2008, 01:59:58 PM EST
Permalink
|
64-core ThinkPad anyone?
Via /. I read this great article on ars technica titled MIT startup raises multicore bar with new 64-core CPU. More interesting is this quote from the article:
"Tell me if this sounds familiar: a grid of processor "tiles" arranged in a mesh network, where each tile houses a general purpose processor, cache, and a non-blocking router that the tile uses to communicate with the other tiles on the chip."
Makes that Intel Core Duo in my ThinkPad seem pretty tame now doesn't it. But seriously the question is raised on slashdot already - how do we program this, and efficiently? The company is Tilera, a small player, but maybe the first of many?
Aug 20 2007, 09:33:18 PM EDT
Permalink
|
|
 |
| S | M | T | W | T | F | S | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | | 14 | 15 | 16 | 17 | 18 | 19 | 20 | | 21 | 22 | 23 | 24 | 25 | 26 | 27 | | 28 | 29 | 30 | | | | | | | | | | | | | | Today |
|