SemWebDev Weblog

eRDF as Query and Report language

Tue Mar 6 2007 at 22:58

I was doing some Smarty templating, marking up the html with eRDF so that I could easily pull the data into the database as triples later. And then I thought about having to write SPARQL queries afterwards, and populate and assign variables, and embed those variables in the template.

It struck me that there would be a great duplication of effort here - I would specify some variables and triples in the SPARQL query to define the data I would get back, and then I’d be specifying triples with variables - the same triples - again in the eRDF HTML.

How can you cut out the repetition? Is it possible that the eRDF template can already contain all the information you need to query the datastore, and fill the template with the results? I hacked up a quick proof of concept to find out:


<p id="me"> My name is <span class="foaf-name">?name</span> </p>

can generate:


<#me> foaf:name ?name

from which you can derive the SPARQL:


SELECT ?name
WHERE
{
    <#me> foaf:name ?name
}

from which you can get the result:


 __________________
|   name          |
-------------------
| Keith Alexander |

Push the value into the right place in the template (where the name of the variable is the name of the result column), and you get: <p id="me"> My name is <span class="foaf-name">Keith Alexander</span> </p>

So far so good.

But what about multiple results? It seemed to me that if I wanted (as you generally do), to have a page with both one-to-one relations (eg: for a CD description - price, artist, record company, release date), and one-to-many relations (eg: the tracks on the album), you ought to do more than one query: one for the top-level single relations, and one for each list. I figured that I could use a pseudo ‘list item’ property (as well as rdf:li), on which an algorithm could determine that all things relating to that list item would feature in a multi-row result set, and should be moved to a different query. And I spent a while this afternoon scratching my head trying to hit upon a decent way to do that, until danja and shellac in the #swig irc channel pointed out that the single relations could ignore the extra rows. So that made it easier, at least for now. I put together a template using a Smarty foreach loop, with some code that fills the appropriate variables with either an array or a single value depending on a variable naming convention, makes a few changes to the template ready to be passed to Smarty, which does the rest. I even put in support for OPTIONAL clauses

How far can you go with this though?

Even so far, this system could actually be pretty powerful - coupled with a regular templating engine, you can process and format the variables in the template before the page is served.

I think it will be necessary to break it up into multiple queries, in order to allow things like nested loops, and a COUNT or AVG feature syntax (that's one of the nice things about using Arc's mysql based store - if SPARQL isn't powerful enough yet, you can get the SQL underlying it instead - and since we are writing eRDF rather than SPARQL, it doesn't really matter how we get the results). On the SPARQL side, it wouldn't be too hard to implement a FILTER - perhaps a variable naming convention like $foo_filter_regex_BAR where BAR is a constant defined in a file for such things. Or perhaps a pseudo-property, like sparql-filter:

	<p id="{$person_id}">
		<span class="foaf-name">{$name}</span>
		<br class="sparql-filter" title="FILTER regex(str($name), 'John')"/>
	</p>
	
(the eRDF template processor would know how to translate that into a query).

What about Forms?

A similar approach could be used for dealing with forms. By using input names that return associative arrays, you can easily generate arbitrary RDF/XML from the post data, eg:


	<form action="erdft_submit" method="post" accept-charset="utf-8">
		<label for="name">Name: <input type="text" name="rdf:RDF[foaf:Person][foaf:name]"/></label>
		<label for="name">Email: <input type="text" name="rdf:RDF[foaf:Person][foaf:mbox]"/></label>
		<input type="submit" value="Submit">
	</form>
	

Can generate:

	
	<rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/#">
		<foaf:Person>
			<foaf:name>John Smith</foaf:name>
			<foaf:mbox>john@example.com</foaf:mbox>
		</foaf:Person>
	</rdf:RDF>
	

- and that can then be saved to your data store.

For validation and processing, the form page would be parsed by the controller, so it knew what inputs to expect, and it would also pull class names from the form input/select/textarea/button elements that would tell the controller what function to run the value of that widget through. The naming could follow a convention like f_valid_email, and the controller would choose the valid_email() function stored in a separate file. Personally, I think this might be preferable to defining models with validation criteria for each property (as CakePHP does, for example), as this can get ugly if you want different validation criteria for different circumstances.

Is it worth it?

Ultimately I suppose, you can invent workarounds and fixes and things for just about anything if you want.
But should you?
Would this be a good method of web development?

Advantages

Disadvantages

Try it out

If you want, you can have a go with the code I've written so far. It's still pretty basic alpha stuff, but if you want to look it over, I'd be delighted to hear comments, suggestions and improvements. For that code, you'll need Smarty and the ARC PHP RDF classes. If you don't want to use Smarty, you can rewrite the regular expressions that define the syntax for the sparql variables (these can be passed as arguments to the class) to use native php syntax instead - just make the first match be a valid sparql variable.

You'll also need an Arc mysql triplestore set up, with data in it, if the code is to do anything. Again, you pass the path to your datastore config file as a parameter to the eRDFT class (see erdft-smarty.phps).

Limitations

Currently, it only generates one big query, which gets you one big data table as a result. Also, properties can't be variables - only subjects and objects. I couldn't think of a way that you could express properties in eRDF so that they would be visible to the user anyway. Still, it would be nice to have variable/unknown properties. Perhaps a syntax like 'erdft-?foo'.

Future Ideas for eRDF-to-SPARQL-and-back-again

I'll probably knock up a Resource class so it can be used with the Tonic PHP REST library/framework, and do the nice automagic form processing things I described above. Then I guess I'll see if I can actually use it for more than "hello world" stuff.

RDF SPARQL eRDF-T