Starting with Stanbol

What is Stanbol?

Stanbol is an application that in simple words allow to obtain aditional information from a document.

For example, if I have a text who talk about the Romans emperors and I pass it through Stanbol, it will infer the language of the text (e.g. English) and the entities present in the text (Rome, Ceasar, Italy, Neron, etc), . With entities we refer to places and persons.

Setting up Stanbol

There is many possible ways to start using Stanbol, we will mention some of them:

Github

The most traditional way is to download the source code with git, and then compile it with maven

  git clone https://github.com/apache/stanbol.git
  mvn install -Dmaven.test.skip=true

You can read the README to understand special the different options for compiling.

In the folder ./launchers/ you can find many package.
There is a standalone version of Stanbol where you have to just run :

  ./launchers/full/target/org.apache.stanbol.launchers.full-1.0.0-SNAPSHOT.jar

That will create Stanbol server in the port 8080.

Also, in the same folder you can find a war version of Stanbol that you can put in a JavaEE application container like Tomcat.

Compiled File

You can retrieve already compiled packages from: iks-project.

The jar version:

  wget http://dev.iks-project.eu/downloads/stanbol-launchers/1.0.0/org.apache.stanbol.launchers.stable-1.0.0-SNAPSHOT.jar

The war version:

  wget http://dev.iks-project.eu/downloads/stanbol-launchers/0.12.1/stanbol-0.12.1-SNAPSHOT.war

Docker

The third option is to use docker to have a virtual machine with Stanbol:

 sudo docker run -i --rm  -p 8080:8080 --name stanbol -t mxr576/stanbol

With this line you will create a stanbol server instance in the port 8080.

This is arguably the easier and most quick way to start using stanbol

Using the web interface

To start understanding enhancement of content from Stanbol and to start extracting semantic data of your document, if you have a local instance of Stanbol running in your machine. Browse this page:

http://localhost:8080/enhancer

There you can write any text and extract the entities related with it.

Stanbol Web Screenshoot

For example, for the text:

Argentina is a big country in South America

Stanbol detect the follow entities:

Stanbol Web Screenshoot 2

It found two entities of the “place” type “argentina” and “south America” and it found the language of the text “english”

You can receive the same information through command line with curl:

curl -X POST -H "Content-type: text/plain" --data "Argentina is a big country in South America." \
    http://localhost:8080/enhancer

Instead of sending plain text, you can upload whole file using curl to Stanbol.
If you have a file “text.txt” in your current folder, with this command line you can upload the file to get the enhanced information:

curl -X POST -H "Content-type: text/plain" -T text.txt \
    http://localhost:8080/enhancer