How to install virtuoso sparql endpoint

From Data-gov Wiki

Jump to: navigation, search





NOTE: See http://tw.rpi.edu/web/inside/endpoints/installing-virtuoso for an updated version.






Infobox (How-To) edit with form
  • name: How to install virtuoso sparql endpoint

  • description: instructions for installing virtuoso SPARQL endpoint
  • creator(s): Zhenning Shangguan
  • created: 2010/04/21
  • modified: 2012-1-29


Contents

Overview

This tutorial is for installing Virtuoso Open Source Edition (VOSE) on 64bit Linux servers. For installation tutorials on other platforms, please refer to the "Downloading, Building and Using" section at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/.


Installation

Packages and source code can be downloaded at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload. For the purpose of this tutorial, we are using the archived package available at http://sf.net/projects/virtuoso/files. Please note that checking out source code from Virtuoso's CVS server is also possible, please refer to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload for more detailed information.


After downloading the archived package (virtuoso-opensource-6.1.1.tar.gz), unzip it to the server you want to have Virtuoso installed.

A detailed guide to compile and install Virtuoso is available online at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSMake. The following is a step-by-step walk-through.

  • Make sure you have all the Package Dependencies.
  • Set the compiler flags according to the hardware processor and OS of your machine.
  • Unpack the downloaded VOSE package and navigate to that folder.
  • In the command prompt, enter
    ./autogen.sh
    • This will check the presence and right version of the required packages.
  • Enter
    ./configure
    • By default, the target installation directories are under /usr/local, but you can specify your desired directory using:
      ./configure --prefix=/path/to/dir
  • Enter
    ./configure
  • Enter
    make
  • Enter
    make install
    • If using the default target directory /usr/local, you should have root privilege.
    • You can also specify desired target directory using
      make install prefix=/home/virtuoso
      . Installing to a directory that the current user have write access doesn't require root privilege.

If no error happens during any of the above steps, the installation should be finished.

Administrative Tasks

Manual Start-up

The Virtuoso server instance can be started by calling
/usr/local/virtuoso-opensource/bin/virtuoso-t -f &
under the directory where the virtuoso.ini is located. Default directory to find virtuoso.ini is
/usr/local/virtuoso-opensource/var/lib/virtuoso/db
.

Manual Shutdown

The Virtuoso server instance can be shutdown using the following steps:

  • Log into the isql interactive SQL command line environment.
    /usr/local/virtuoso-opensource/bin/isql 1111 dba <password>
  • Execute the shutdown() function.
    SQL> shutdown();

Start-up/Shutdown Scripts

We have come up with some command line scripts on 64bit Linux (CentOS 5) to start-up/shutdown/restart the Virtuoso server instance and SPARQL endpoint in a single command.

  • To start the Virtuoso instance:
    sudo /etc/init.d/virtuosod start
  • To stop the Virtuoso instance:
    sudo /etc/init.d/virtuosod stop
  • To restart the Virtuoso instance:
    sudo /etc/init.d/virtuosod restart

Please note that:

  • All commands require sudo privileged user accounts.
  • Once the Virtuoso server instance is started successfully, the SPAQL endpoint will immediately become accessible at
    http://<host>:<port>/sparql
  • In order to start the Virtuoso instance correctly, please make sure there are no existing live Virtuoso instances running under the directory of /usr/local/virtuoso-opensource/var/lib/virtuoso/db. Otherwise, the startup command will fail due to the file locking mechanisms used by the Virtuoso implementation.

Loading Triples

We have come up with some command line utility scripts for loading triples in different formats into a named graph in the Virtuoso triple store. The scripts are located at
/opt/scripts

The formats supported are:

  • RDF/XML
  • Turtle
  • N-triples
  • N-quad

Please follow these steps to load a data file (in either of the formats above) into a named graph:

  • cd to /opt/scripts.
    cd /opt/scripts
  • run the script vload, with exactly three arguments:
    • format: [rdf | ttl | nt | nq] corresponds to RDF/XML, Turtle, N-triples, and N-quad respectively.
    • data_file: path to the raw data file.
    • graph_uri: named graph uri into which the triples should be loaded
sudo ./vload nt /path/to/data/file/data-1554.nt http://data-gov.tw.rpi.edu/vocab/Dataset_1554

Deleting Named Graphs

There is a utility command for deleting a specific named graph from the triple store. It is located at
/opt/scripts

It takes only one argument, the URI of the named graph to be deleted. So, to delete all the triples in the named graph <http://data-gov.tw.rpi.edu/vocab/Dataset_1554>, you can use the following command.

sudo ./vdelete http://data-gov.tw.rpi.edu/vocab/Dataset_1554

Performance Tuning

There are online documentations on how to tune VOSE for better performance, such as the one at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning and http://plato.cs.rpi.edu:8890/doc/html/rdfperformancetuning.html.

Generally, configuring some of the parameters in the virtuoso.ini file to proper values helps to improve performance both in terms of loading big datasets and query evaluation. The following is a list of parameters in the virtuoso.ini file that needs to look at:

  • ServerThreads
    • Max number of threads used in the server, should be set close to the number of concurrent connections if heavy usage is expected. A value of 100 should work on most systems.
  • O_DIRECT
    • This may be useful if a large fraction of RAM is configured as database buffers. If this is on, the file system cache will not grow at the expense of the database process, for example it is less likely to swap out memory that Virtuoso uses for its own database buffers.
  • NumberOfBuffers
    • This controls the amount of RAM used by Virtuoso to cache database files. This has a critical performance impact and thus the value should be fairly high for large databases. Exceeding physical memory in this setting will have a significant negative impact. For a database-only server about 65% of available RAM could be configured for database buffers. Please also note that each buffer takes about 8700 Bytes (or roughly 9KB).
  • CompileProceduresOnStartup
    • Setting this to 0 will speed up virtuoso startup, because stored procedures will not be loaded until the first time they are called.
  • FDsPerFile
    • Number of file descriptors per file to be obtained from OS. This parameter only effects databases that use striping. Having multiple FDs per file means that as many concurrent I/O operations may simultaneously be pending per file. This allows more flexibility for the OS to schedule the operations, potentially improving file I/O throughput.
  • ResultSetMaxRows
    • This setting is used to limit the number of the rows in the result. Sometimes adjusting the value of this parameter helps to prevent D.O.S attack.

Currently, our experiences is that on a 64bit Linux machine with 8 CPU cores (2*Quad core processor) and 32GB memory, setting the NumberOfBuffers parameter to the value of (32959832*0.6/8 = 2,400,000) will increase the performance significantly.

Facts about How to install virtuoso sparql endpointRDF feed
Dcterms:created21 April 2010  +
Dcterms:creatorZhenning Shangguan  +
Dcterms:descriptioninstructions for installing virtuoso SPARQL endpoint
Dcterms:modified2012-1-29
Foaf:nameHow to install virtuoso sparql endpoint
Skos:altLabelHow to install virtuoso sparql endpoint  +, how to install virtuoso sparql endpoint  +, and HOW TO INSTALL VIRTUOSO SPARQL ENDPOINT  +
Personal tools
internal pages