How to use SPARQL

From Data-gov Wiki

(Redirected from SPARQL for beginners)
Jump to: navigation, search
Infobox (How-To) edit with form
  • name: How to use SPARQL

  • description: A tutorial about SPARQL.
  • creator(s): Devin Gaffney
  • created: 2010/02/25
  • modified: 2010-4-7


Contents

About SPARQL

Very beginning

SPARQL is, most essentially put, an RDF query language. If you're brand new to this sort of thing, it's probably best to think of it as something akin to SQL, where we have a specific form of data that we are accessing through a structured set of commands that help us define, narrow down, parameterize, or otherwise condition, such as the most basic ones shown in the sample query below:

SELECT * 
    FROM Book
    WHERE price > 100.00
    ORDER BY title;

More or less, this is the basic structure of querying for data. We'll get to the different definitions later on, but notice the different roles each command plays: SELECT determines the objects to be returned, FROM contextualizes them, WHERE parameterizes them, and ORDER BY defines the organization in how they will be returned. This should all be known, but again, it's important to start from the beginning when talking about something entirely new.

Why SPARQL?

SPARQL is the only Semantic Query Language that is an official W3C Recommendation, and as such, has the greatest chance of actual standardization as the Semantic web grows. Although there's obviously room for others to come along, its early adopter status, combined with a nominal seal of official standardization, is likely to keep it as the main utility for searching through semantic data sets.

How is it set up?

There are some basic terms to be familiar with when we are talking about using SPARQL:

  • SPARQL Service - The actual available service for handling SPARQL queries
  • SPARQL Service Endpoint - A site such as the Virtuouso SPARQL Query Demo that handles and returns the SPARQL queries through http. A SPARQL endpoint can either be "generic" or "specific," meaning that they can either accept any RDF dataset URI that is specified for the query, or they are hardwired to only use one specific RDF dataset. In the Virtuoso endpoint the former is true, whereas in the case of the Redland Rasqal endpoint, the latter is true.
  • SPARQL Query URI/RDF Dataset - This is the location of the data the SPARQL Query is querying. In many cases, the service endpoint requires a query uri be passed into a field, which is parameterized to something like "query-uri"

A simple query

Perhaps the best way to explain SPARQL is by looking at a query itself. Let's take the simplest version of a query, and dissect it:

SELECT ?s ?p ?o
WHERE
{
  ?s ?p ?o .
}

(Where, as you should certainly know by now, s, p, and o stand for subject, predicate, and object in a triple)

In the above case, assuming we have specified the database (which, by omitting a default graph, it is likely located in the other service-uri usually located at an endpoint), we are selecting all subjects, predicates, and objects from the set, without any conditions. Although this is the most simple possible query to do (and write), a more specific example will be easier for understanding how this language works.

SELECT ?title
WHERE
{
  <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title .
}

In the above case, we are now specifying a bunch more information. We want, specifically, the title of the subject. In our case, the Subject is the URI http://example.org/book/book1 (which is actually a host of information about the book), and the predicate is an explanation of the role of a title, or the contents at http://purl.org/dc/elements/1.1/title (which should resolve to http://dublincore.org/2008/01/14/dcelements.rdf#title), and then the search variable "?title" itself. The example above is specifically used in the W3C Recommendation document in Section 2.1.

Query Structure

Seaborne's Presentation on the basics of SPARQL, titled "SPARQL by Example," is probably the most thorough explanation/tutorial that currently exists. In this presentation, he lays out the clear structure of the language, and defines each section of the query structure as such:

A SPARQL query comprises, in order:
  • Prefix declarations, for abbreviating URIs
  • Dataset definition, stating what RDF graph(s) are being queried
  • A result clause, identifying what information to return from the query
  • The query pattern, specifying what to query for in the underlying dataset
  • Query modifiers, slicing, ordering, and otherwise rearranging query results
--From Seaborne's presentation


# prefix declarations
PREFIX foo: <http://example.com/resources/>
...
# dataset definition
FROM ...
# result clause
SELECT ...
# query pattern
WHERE {
    ...
}
# query modifiers
ORDER BY ...

List of SPARQL vocabulary:

Prefix Declarations:

  • BASE
  • PREFIX

Dataset Definition:

  • FROM
  • FROM NAMED

Result Clause:

  • SELECT
  • CONSTRUCT
  • DESCRIBE
  • ASK

Query Pattern:

  • WHERE

Query Modifiers:

  • ORDER BY
  • LIMIT
  • OFFSET
  • DISTINCT
  • REDUCED

Query Vocabulary (different ways of evaluating/altering/etc query pattern)

Query syntax

The syntax is very similar to other basic principles of query languages; you will notice that right away if you're coming from something like SQL, which reads in a similar way, although symbols may be different. Let's walk through an example to get some of the core syntax down: Given this data set (which we'll start calling graphs from now on, since that is the name given to them by the community; they are called RDF Graphs, not RDF Datasets or even worse RDF Tables, which would be completely wrong), [1], we want to find the name of every person.


PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE {
    ?person foaf:name ?name .
}

Let's dig through what this says. In PREFIX, two options are passed, "foaf:" and the url "<http://xmlns.com/foaf/0.1/>". Basically, this is a definition. When we say foaf:, we're actually passing in that url, for instance, in the actual query pattern, foaf:name would eventually resolve to <http://xmlns.com/foaf/0.1/>:name. This defines what a foaf is, where its located, and how to use it, more or less. Prefixes are used to shorten and "clean" the query, and are unnecessary.

The way a variable is specified in SPARQL is the prepended "?". In this instance, we have two different variables, ?person (which in this case just means any subject in the dataset that has a foaf:name predicate, and is unused otherwise) and ?name (which in this case is the variable we are SELECTing to be returned in the event that the pattern matches any triples). The period at the end of the line denotes the end of that particular pattern. Other operators can be used in its place, such as a comma or a semicolon.

This really is the most basic syntax of the SPARQL query. To learn more about the syntax, read Section 4 of the documentation. The discussion in 4.1.2 is of particular interest; understanding the way literals are used is key in making queries that produce reliable and exact results.

Specifying a data set

For more on this, please read the article titled How to specify RDF Dataset in SPARQL queries

Examples

Resource Roundup

Official Sources

Knowledge Bases

Planet RDF, a site all about RDF format, can help in learning structure of files alongside SPARQL

Cheat Sheets

Slideshows

Tutorials

Service Endpoints

Facts about How to use SPARQLRDF feed
Dcterms:created25 February 2010  +
Dcterms:creatorDevin Gaffney  +
Dcterms:descriptionA tutorial about SPARQL.
Dcterms:modified2010-4-7
Foaf:nameHow to use SPARQL  +
Skos:altLabelHow to use SPARQL  +, how to use sparql  +, and HOW TO USE SPARQL  +
Personal tools
internal pages