A crash course in SPARQL

From Data-gov Wiki

Jump to: navigation, search
Infobox (How-To) edit with form
  • name: A crash course in SPARQL

  • description: A very fast course to start using SPARQL
  • keyword(s): sparql
  • creator(s): Alvaro Graves
  • created: 2010/08/23
  • modified: 2010-8-24


Contents

1. Introduction

SPARQL is a query language for the Semantic Web. It was designed to be similar to SQL, a query laguage for relational databases, so it is relatively easy for people to learn it. An examples of a query can be

 SELECT ?node ?title
 WHERE{
   ?node <http://purl.org/dc/elements/1.1/title> ?title .
 }
 LIMIT 1

You can go to http://data-gov.tw.rpi.edu/sparql, copy and paste the previous code into the box and click on "run query". It will show the following results

nodetitle
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"

The results are displayed in HTML form (there are other formats we will look later). But in order to understand what all this means, we need to undesrtand the concept of triple.

2. What is a Triple?

A Triple is the minimal amount of information expressable in Semantic Web. It is composed by 3 elements:

  1. A subject which is a URI (e.g., a "web address") that represents something.
  2. A predicate which is another URI that represents a certain property of the subject.
  3. An object which can be a URI or a literal (a string) that is related to the subject through the predicate.

Thus, for example, a few triples can be

 <http://graves.cl/foaf.rdf#me> <http://xmlns.com/foaf/0.1/givenname> "Alvaro"
 <http://graves.cl/foaf.rdf#me> <http://xmlns.com/foaf/0.1/schoolHomepage> <http://www.rpi.edu> .


3. Understanding basic SPARQL

Back to our first example, we can see now what it does. We request two variables that we call ?node and ?title (variables start with a question mark). In the second to fourth line we create the graph patterns on how this variables should relate. In this case we say that ?node should have a ?title related through the predicate <http://purl.org/dc/elements/1.1/title>, which is defined to represent that something has a title. Finally the LIMIT 1 line allows the system to retrieve only 1 result (in case the may be multiple).

 SELECT ?node ?title
 WHERE{
   ?node <http://purl.org/dc/elements/1.1/title> ?title .
 }
 LIMIT 1

In english, we are asking "Give me some resource that has a title".

4. Prefixes and shortcuts

One of the problems with managing URIs is that they re too long. For example, if we are looking the names of different people we can ask

 SELECT ?node ?name
 WHERE{
   ?node <http://xmlns.com/foaf/0.1/givenname> ?name .
   ?node <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
 }
 LIMIT 10

In this case we are asking "Give me all the resources that are of type Person and have a name". The results of this query will be

nodename
<http://dbpedia.org/resource/David_L._Boren>"David L."@en
<http://dbpedia.org/resource/James_L._Jones>"James L."@en
<http://dbpedia.org/resource/Joe_Biden>"Joe"@en
<http://dbpedia.org/resource/Lawrence_Summers>"Lawrence"@en
<http://dbpedia.org/resource/Michelle_Obama>"Michelle"@en
<http://dbpedia.org/resource/Nancy-Ann_DeParle>"Nancy-Ann"@en
<http://dbpedia.org/resource/Rahm_Emanuel>"Rahm"@en
<http://dbpedia.org/resource/Valerie_Jarrett>"Valerie B."@en
<http://dbpedia.org/resource/David_L._Boren>"David L."@en
<http://dbpedia.org/resource/James_L._Jones>"James L."@en

It is easy to see that adding more and more restricions makes these queries really long and hard to manage. More improtantly it is very likely that we make typos and sintactic mistakes.
To solve this we can use PREFIX at the beginning of th query, which allows us to specify a namespace for the URIs. For example, the same query from above would look like


 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 SELECT ?node ?name
 WHERE{
   ?node foaf:givenname ?name .
   ?nore rdf:type foaf:Person .
 }
 LIMIT 10


Now, instead of adding the whole URI, we use the prefix (in this case foaf or rdf) and add only the last part of the URI.

5. Shortcuts

As we have seen, sometimes we define the graph we want to retrieve describing several properties from the same node. A way to simplify this is using a semicolon instead of a point in each triple and omiting the subject. Thus, our previous example would look like


 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 SELECT ?node ?name
 WHERE{
   ?node foaf:givenname ?name ;
            rdf:type foaf:Person .
 }
 LIMIT 10

We changed the first restriction final period to a semicolon and thus we don't need to write the subject node again.

Another common issue is the need to specify a certain type (using rdf:type). Only for this predicate we can changed it to "a"

 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 SELECT ?node ?name
 WHERE{
   ?node foaf:givenname ?name ;
            a foaf:Person .
 }
 LIMIT 10


Given that rdf:type was the only time we used the rdf prefix, we can get rid of it as well.

6. Graphs

Triple stores allow to use named graphs: This allow people to have multiple graphs in the same database. Each named graph is named with a URI (which eventually can be used for describing other things as well).

For example, using our first example, but now limiting to 3 results


 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT ?node ?title
 WHERE{
   ?node dc:title ?title .
 }
 LIMIT 3


We obtain three identical results as can be seen in the enxt table.

nodetitle
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"

Actually what is happening is that there are three identical triples in different graphs. We can see this using the following query.


 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT ?graph ?node ?title
 WHERE{
   GRAPH ?graph{
     ?node dc:title ?title .
   }
 }
 LIMIT 3
graphnodetitle
<http://xmlns.com/foaf/0.1/Document><http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/homepage><http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"
<http://xmlns.com/foaf/0.1/><http://xmlns.com/foaf/0.1/>"Friend of a Friend (FOAF) vocabulary"


In this case we added the variable ?graph to obtain in which graph is located each result. Finally, we can include several graphs in a SPARQL query:

 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT  ?node8 ?desc8 ?node401 ?desc401
 WHERE{
   GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_401>{
     ?node401 dc:description ?desc401 .
   }
   GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_8>{
     ?node8 dc:description ?desc8 .
   }
 }
 LIMIT 3
node8desc8node401desc401
<http://data-gov.tw.rpi.edu/raw/8/data-8-00003.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "<http://data-gov.tw.rpi.edu/raw/401/data-401.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/8/data-8-00003.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "<http://data-gov.tw.rpi.edu/raw/401/index.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/8/data-8-00005.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "<http://data-gov.tw.rpi.edu/raw/401/data-401.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "

7. Union

It is also possible to obtain the UNION of different graph patterns, for example

 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT  ?node8 ?desc8 ?node401 ?desc401
 WHERE{
   {
     GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_401>{
       ?node401 dc:description ?desc401 .
     }
   }UNION{
     GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_8>{
       ?node8 dc:description ?desc8 .
     }
   }
 }
 LIMIT 3
node8desc8node401desc401
<http://data-gov.tw.rpi.edu/raw/401/data-401.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/401/index.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
<http://data-gov.tw.rpi.edu/raw/8/data-8-00003.rdf>"generated by csv2rdf VER 0.2 (2009-09-19), http://data-gov.tw.rpi.edu/wiki/Source_Code_of_Data.gov_Wiki#Source_Code_of_Demos_and_Applications. "
Compare it with the result of the previous example

8. Optional

Sometimes we find that certain patterns are desired but not mandatory. For that case, we can uso OPTIONAL in our query. Thus we may be able to match at least the required patterns and only eventually the ones inside OPTIONAL


 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 
 SELECT   ?node ?name ?givenname
 WHERE{
     ?node foaf:name ?name .
     OPTIONAL{
       ?node foaf:givenname ?givenname .
     }
 }


nodenamegivenname
<http://data-gov.tw.rpi.edu/vocab/Department_of_Commerce>"Department of Commerce"
<http://data-gov.tw.rpi.edu/vocab/Environmental_Protection_Agency>"Environmental Protection Agency
<http://dbpedia.org/resource/David_L._Boren>"David L. Boren"@en"David L."@en

9. Filters

A very useful component in SPARQL is the use of the operator FILTER, which allows users to create specific restrictions, based on aritmetic operators, regular expressions, etc.


 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 
 SELECT   ?node ?name ?givenname
 WHERE{
     ?node foaf:name ?name .
     ?node foaf:givenname ?givenname .
     FILTER regex(?name, "Biden") .
 }


nodenamegivenname
<http://dbpedia.org/resource/Joe_Biden>"Joe Biden"@en"Joe"@en
<http://dbpedia.org/resource/Joe_Biden>"Joseph Biden"@en"Joe"@en

10. References

For more informatino on SPARQL, please check the following links

Facts about A crash course in SPARQLRDF feed
Dcterms:created23 August 2010  +
Dcterms:creatorAlvaro Graves  +
Dcterms:descriptionA very fast course to start using SPARQL
Dcterms:modified2010-8-24
Dcterms:subjectsparql  +
Foaf:nameA crash course in SPARQL
Skos:altLabelA crash course in SPARQL  +, a crash course in sparql  +, and A CRASH COURSE IN SPARQL  +
Personal tools
internal pages