Graphite

Quick and Dirty PHP Library for hacking with Linked Data.

©2010 Christopher Gutteridge, University of Southampton.

Version 1.4 2010-07-28

Tools built with Graphite & sparqllib

What is Graphite?

Graphite is an open source PHP Library, built on top of ARC2, to make it easy to do stuff with RDF data really quickly, without having to naff around with databases. It is not intended to be scalable, or a way of authoring RDF data. There's plenty of stuff out there already to do that.

It is very similar to EasyRDF and SimpleGraph, and I've lifted ideas from these, and they have features that Graphite doesn't. Use what works for you.

sparqllib.php

Also hosted on this site is sparqllib.php, a simple PHP library for querying SPARQL.

Hello World

Here's the minimal example to get started with. Please note, by the way, that the "Output" section of all these examples is being generated on-the-fly from the PHP.

Code
<?php
include_once("arc/ARC2.php");
include_once("Graphite.php");
 
$graph = new Graphite();
$graph->load( "http://webscience.org/person/7" );
print $graph->resource( "http://webscience.org/person/7" )->get( "foaf:name" );
?>
 
Output
Christopher Gutteridge

Installation

Really quick install

Type this while in the directory you want to write php scripts in:

curl -s http://graphite.ecs.soton.ac.uk/download.php/Graphite_and_ARC2.tgz | tar xzvf -

The version of ARC which comes with the above already has my patch applied and is distributed with permission of the author. (Although it's GPL so asking was just me being polite)

Longer install

You'll need to Install ARC2 then download the single Graphite.php library and put it in the same directory as /arc/

Patch to ARC2 (optional)

The ARC2 loader does not follow relative redirects, which many servers use in their 303 headers. It also confuses some sites by including the port in the HTTP "Host:" header. I've made a slightly patched version of the ARC2_Reader.php library which works around this. I suggest you use it if you want to use the "sameAs" Graphite method to it's full potential. My added/altered lines are marked with #cjg

Update policy

This library is updated (for now) when I have time or inspiration. If you're using it, let me know (email at bottom of page).

Bugs

  • Solaris
    • ARC has some issues under Solaris, I have a work around if you need it.
    • The above installer wget|tar line won't work on Solaris.
  • Windows
    • I've had a report of Graphite not working under windows. Confirmation and suggesitons are encouraged.

Version 1.4 (2010-07-28)

  • New options to dump function: one to show label and rdf:type of each resource, another to show related resources by label (if available) rather than URI.
  • Methods such as all() now accept a resourcelist, an array, or a comma separated list of URIs or resources.
  • Added more methods for working with resource lists: append,union,except,intersection and sort.
  • Cache option to save RDF/XML documents locally to save doing an HTTP request every time.

Version 1.3

  • Improvement to dump function. Now shows tabs, \r \t and runs of more than one space.

Version 1.2

  • Added title='' to all links in the dump so you can hover to see the full URL.
  • The empty URI <> now displays as "* This Document *" for clarity.

Version 1.1 (2010-03-23)

New features:

  • $resource->loadDataGovUKBackLinks()
  • The ResourceList class which make Graphite work a bit more like jQuery, which is nice.

Namespaces

By default, Graphite defines a bunch of common namespaces: foaf, dc, dct, rdf, rdfs, owl, xsd, cc, bibo, skos, geo, sioc. But you can add your own to make your code more readable.

Code
<?php
include_once("arc/ARC2.php");
include_once("Graphite.php");
 
$graph = new Graphite();
$graph->ns( "wsperson", "http://webscience.org/person/" );
$graph->load( "wsperson:6" );
print $graph->resource( "wsperson:6" )->get( "foaf:name" );
?>
 
Output
Tim Berners-Lee

You may notice that this would get confused if you made namespaces like "urn" or "http". You're right, don't do it. I considered making a distinction between full URIs and those shortened with an alias, and decided that the point of the library was speed of hacking not semantic perfection.

Inspecting a resource

To see all the incoming and outgoing relations from a URI, use the debug method. The left arrows indicate incoming relations. You can also dump the Graphite object to see everything.

Code
<?php
include_once("arc/ARC2.php");
include_once("Graphite.php");
 
$graph = new Graphite();
$graph->ns( "wsperson", "http://webscience.org/person/" );
$graph->load( "wsperson:2" );
print $graph->resource( "wsperson:2" )->dump();
?>
 
Output

Working with Lists

The all() method returns an array of matching values. Using a "-" uses the inverse of the relationship so "-foo:Child" would match the things which this was a child of rather than the things that were a child of it.

Code
<?php
include_once("arc/ARC2.php");
include_once("Graphite.php");
 
$graph = new Graphite();
$graph->load( "http://webscience.org/people" );
print $graph->allOfType( "foaf:Person" )->sort( "foaf:name" )->get( "foaf:name" )->join( ", " ).".\n";
 
Output
Wendy Hall, Tim Berners-Lee, Christopher Gutteridge, John Taylor, Susan Davies, Harold (Hal) Abelson, David De Roure, Stefan Decker, Manuel Castells, James Hendler, Daniel Weitzner, Nigel Shadbolt, Noshir Contractor, Bebo White, Michael L. Brodie, Harith Alani, Leslie Carr, Richard Cyganiak, Jennifer Golbeck, Lalana Kagal, Deborah L. McGuinness, Kieron O'Hara, mc schraefel, Amy van der Hiel, Mark Weal, Nichola Need, Hugh Glaser, Samantha Collins, Joyce Lewis, Peter Monge, Helen Margetts, Brian Uzzi, Jianping Wu, Hans Akkermans, Steffen Staab, Sudarshan Murthy, Craig Gallen.

Linked data

To follow a link to another datasource is really easy, thanks to the ARC2 library papering over many of the cracks! However, see my note in Installation for a tiny patch to ARC2 to make it follow non-absolute redirects.

Code
<?php
include_once("arc/ARC2.php");
include_once("Graphite.php");
 
$person_uri = "http://eprints.ecs.soton.ac.uk/id/person/ext-1248";
 
$graph = new Graphite();
 
# this must be a directory the webserver can write to.
$graph->cacheDir( "/usr/local/apache/sites/ecs.soton.ac.uk/graphite/htdocs/cache" );
 
$graph->load( $person_uri );
 
$person = $graph->resource( $person_uri );
 
print "<h3>".$person->link()."</h3>";
 
# Show sameAs properties
foreach( $person->all( "owl:sameAs" ) as $sameas ) { print "<div>sameAs: ".$sameas->link()."</div>"; }
 
showPersonInfo("Before",$person);
 
# follow the sameAs links and load them into our graph
$person->loadSameAs();
 
showPersonInfo("After",$person);
 
function showPersonInfo($title,$person)
{
	print "<h4>$title</h4>";
	print "<div><b>name:</b> ".$person->all( "foaf:name" )->join( ", ")."</div>";
	print "<div><b>phone:</b> ".$person->all( "foaf:phone" )->join( ", ")."</div>";
	print "<div><b>homepage:</b> ".$person->all( "foaf:homepage" )->join( ", ")."</div>";
}
?>
 
Output

http://eprints.ecs.soton.ac.uk/id/person/ext-1248

Before

name: Christopher Gutteridge, Christopher J. Gutteridge
phone:
homepage:

After

name: Christopher Gutteridge, Christopher J. Gutteridge, Christopher Gutteridge
phone: tel:+442380594833
homepage: http://users.ecs.soton.ac.uk/cjg

This rather funky example has gone and got information from twitter (via http://semantictweet.com/) from wikipedia (via http://dbpedia.org) and from the FOAF provided by Wendy's school. It uses the cacheDir() method so it does not have to make a bunch of HTTP requests every single time the page is loaded.

Documentation

Graphite Class

This object represents a set of zero or more bits of RDF data and gives you some functions to start poking them.

$graph = new Graphite();
$graph = new Graphite( $namespace_map );
Create a new instance of Graphite. See the Namespaces example above for how to specify a namespace map and a list of pre-declared namespaces.
$graph->ns( $alias, $namespace );
Add an additional namespace alias to the Graphite instance.
$count = $graph->load( $uri_to_load );
Load the RDF from the given URI or URL. Return the number of triples loaded.
$count = $graph->addTurtle( $base, $data );
Take a base URI and a string of turtle RDF and load the new triples into the graph. Return the number of triples loaded.
$count = $graph->addRDFXML( $base, $data );
As for addTurtle but load a string of RDF XML.
$resource = $graph->resource( $uri );
Get the resource with given URI. $uri may be abbreviated to "namespace:value".
$graph->cacheDir( $dir, [$age] );
$dir should be a directory the webserver has permission to read and write to. Any RDF/XML documents which graphite downloads will be saved here. If a cache exists and is newer than $age seconds then load() will use the document in the cache directory in preference to doing an HTTP request. $age defaults to 24*60*60 - 24 hours. This including this function can seriously improve graphite performance! If you want to always load certain documents, load them before setting the cache.
print $graph->dump( [$options] );
Returns the entire RDF in memory in nicely formatted HTML so you can see what's going on. Options is an optional array, same parameters as $options on a dump() of an individual resource.
print_r( $graph->toArcTriples() );
Returns the entire RDF in memory in Arc2's internal triples format.
print $graph->serialize( [$serializer="RDFXML"] );
Returns the serialization of the entire RDF graph in memory using one of Arc2's serializers. By default the RDF/XML serializer is used, but others (try passing "Turtle" or "NTriples") can be used - see the Arc2 documentation.
$resource = $graph->primaryTopic();
Utility method (shamelessly ripped off from EasyRDF). Returns the primary topic of the first URL that was loaded. Handy when working with FOAF.
$resources = $graph->allOfType( $type_uri );
Return a list of all resources loaded, with the rdf:type given. eg. $graph->allOfType( "foaf:Person" )
$resources = $graph->allSubjects();
Return a list of all resources in the graph which are the subject of at least one triple.
$resources = $graph->allObjects();
Return a list of all resources in the graph which are the object of at least one triple.

Resource Class

This represents a single RDF thing, with a URI, or a literal value. If you treat it as a string it will return the URI or the literal, but there's some useful functions you can call on it, too.

$new_resource = $resource->get( $property );
$new_resource = $resource->get( *resource list* );
Get a single resource, related to the current resource by the given property. The property may be a URI or "namespace:value". It may be a literal or a normal resource. If there's no value, it returns a null resource object (not a null value), so your code will muddle through. If you specify multiple properties, then this returns the first resource it finds. To get a value from the inverse of a relationship, prefix the property string with "-". For example, to get the mayor of a city you may need to do $city->get( "-somens:isMayorOf" );
$string = $resource->getString( $property );
$string = $resource->getString( *resource list* );
As for get() but return the result's string value.
$boolean = $resource->has( $property );
$boolean = $resource->has( *resource list* );
Returns true if there this resource has at least one relationship with the given properties (or with any of the given properties)
$resource_list = $resource->all( $property );
$resource_list = $resource->all( *resource list* );
As for get(), but returns a resource list of zero or more matching resources. Resource lists may be treated as arrays, but see below for additional properties.
$resource_list = $resource->allString( $property );
$resource_list = $resource->allString( *resource list* );
As for all(), but returns a resource list the string values of each result.
$relation_list = $resource->relations();
Returns a resource list of the properties relating to this resource, including inverse ones.
$count = $resource->load();
Try and resolve this resource via the web and load the RDF found there into the current Graphite object. Returns the number of triples loaded (zero probably indicates failure).
$count = $resource->loadSameAs( [$prefix] );
Iterate over any owl:sameAs properties this resource has and load them from the web. Any loaded triples with the URI of the resource being loaded will be created with both that URI and the URI of the current resource, causing the linked data to be immediately available via the current resource. A single Graphite instance won't load the same URI more than once, so you can call this function twice to follow first, then second generation sameAs relations, if you want. The 2nd call will return the number of triples it loaded the first time, so don't try looping until it returns zero, rather loop until the number of triples returned does not increase. If $prefix is set then only follow sameAs links to URIs with that prefix.
$resource = $resource->type()
$resources_list = $resource->types()
Handy methods for getting the rdf:type or types of the resource. Use for code readability.
$boolean = $resource->isType( $type );
$boolean = $resource->isType( *resource list* );
Return true if the resource is of the given type, or any of the given list of types.
$boolean = $resource->isNull();
Return true if the resource is a null node, e.g. as the result of $resource->get("something:nonexistent").
$label = $resource->label()
Get a valid label for this resource, tries all of skos:prefLabel, rdfs:label, foaf:name, dct:title, dc:title, sioc:name.
$htmllink = $resource->link()
Returns <a href='THIS'>THIS</a> - saves time when wanting to link homepages, etc.
$dump = $resource->dump( [$options] );
Create a pretty HTML dump of the current resource. Handy for debugging halfway through hacking something. $options is an optional array of flags to modify how dump() renders HTML.
"label"=>1 - add a label for the URI, and the rdf:type, to the top of each resource box, if the information is in the current graph.
"labeluris"=>1 - when listing the resources to which this URI relates, show them as a label, if possible, rather than a URI. Hovering the mouse will still show the URI.
"internallinks"=>1 - instead of linking directly to the URI, link to that resource's dump on the current page (which may or may not be present). This can, for example, make bnode nests easier to figure out.
print_r( $graph->toArcTriples() );
Returns all triples of which this resource is the subject in Arc2's internal triples format.
print $graph->serialize( [$serializer="RDFXML"] );
Returns the serialization of the resource including any bnodes it points to (and any those point to). See the serialize method of the Graphite class above.

Experimental Resource Class methods

These do some funky stuff relating to our research at Southampton. They may be modified or removed in later versions of Graphite.

$count = $resource->loadSameAsOrg( $prefix );
Look up this URI in http://sameas.org and follow any sameAs URIs from there which match the prefix. $prefix is enforced as sameAs.org may return hundreds of URIs.
$count = $resource->loadDataGovUKBackLinks()
Takes the current resource and attempts to find backlinks via The EnAKTing project. This works rather nicely with dbpedia URIs of British regions, cities etc.

ResourceList Class

This represents a list of RDF things, URIs or literal values. You can just treat it like an array, or you can call methods on it.

$new_resourcelist = $resourcelist->get( $property );
$new_resourcelist = $resourcelist->get( *resource list* );
Call $resource->get(...) on every item in this list and return a resourcelist of the results.
$string = $resource->getString( $property );
$string = $resource->getString( *resource list* );
Call $resource->getString(...) on every item in this list and return a resourcelist of the results.
$new_resourcelist = $resourcelist->all( $property );
$new_resourcelist = $resourcelist->all( *resource list* );
Call $resource->all(...) on every item in this list and return a resourcelist of all the results. Duplicate resources are eliminated.
$resource_list = $resource->allString( $property );
$resource_list = $resource->allString( *resource list* );
Call $resource->allString(...) on every item in this list and return a resourcelist of all the results. As with all(), duplicate resources and eliminated.
$resource_list = $resource->allOfType( $type_uri );
Create a new resource list containing all resources in the current list of the given type.
$new_resourcelist = $resourcelist->label();
Call $resource->label() on every item in this list and return a resourcelist of the results.
$n = $resourcelist->load();
Call $resource->load() on every item in this list and return the total triples from these resources. Careful, this could cause a large number of requests at one go!
$str = $resourcelist->join( $joinstr );
Returns a string of all the items in the resource list, joined with the given string.
$new_resourcelist = $resourcelist->sort( $property );
$new_resourcelist = $resourcelist->sort( *resource list* );
Return a copy of this resource list sorted by the given property or properties. If a resource has multiple values for a property then one will be used, as with $resource->get().
$new_resourcelist = $resourcelist->distinct();
Return a list with any duplicates removed, otherwise preserving current order.
$new_resourcelist = $resourcelist->append( $resource );
$new_resourcelist = $resourcelist->append( *resource list* );
Create a new resource list with the given resource or list of resources appended on the end of the current resource list.
$new_resourcelist = $resourcelist->union( $resource );
$new_resourcelist = $resourcelist->union( *resource list* );
Create a new resource list with the given resource or list of resources merged with the current list. Functionally the same as calling $resourcelist->append( ... )->distinct()
$new_resourcelist = $resourcelist->intersection( $resource );
$new_resourcelist = $resourcelist->intersection( *resource list* );
Create a new resource list with containing only the resources which are in both lists. Only returns one instance of each resource no matter how many duplicates were in either list.
$new_resourcelist = $resourcelist->except( $resource );
$new_resourcelist = $resourcelist->except( *resource list* );
Create a new resource list with containing only the resources which are in $resourcelist but not in the list being passed in. Only returns one instance of each resource no matter how many duplicates were in either list.
$dump = $resourcelist->dump( [$options] );
Returns a string containing a dump of all the resources in the list. Options is an optional array, same parameters as $options on a dump() of an individual resource.

* Note about Graphite methods which can take a list of resources

These methods work in a pretty cool way. To make life easier for you they can take a list of resources in any of the following ways.

$resource->get() is used as an example, it applies to many other methods.

$resource->get( $uri_string )
Such as "http://xmlns.com/foaf/0.1/name".
$resource->get( $short_uri_string )
using any namespace defined with $graph->ns() or just built in. eg. "foaf:name".
$resource->get( $resource )
An instance of Graphite_resource.
$resource->get( $thing, $thing, $thing, ... )
$resource->get( array( $thing, $thing, $thing, ... ) )
Where each thing is any of $uri_string, $short_uri_string or $resource.
$resource->get( $resourcelist )
An instance of Graphite_resourceList.

This should make it quick and easy to write readable code!

Contact

Get in touch with me at cjg@ecs.soton.ac.uk and you could have a look at our web team blog.