6th Open Challenge on Question Answering over Linked Data (QALD-6)
Motivation and Objectives
The past years have seen a growing amount of research on question answering over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power of Semantic Web standards while at the same time hiding their complexity behind an intuitive and easy-to-use interface. The Question Answering over Linked Data challenge provides an up-to-date benchmark for assessing and comparing systems that mediate between a user, expressing his or her information need in natural language, and RDF data.
The key challenge for question answering over linked data is to translate a user's information need into a form such that it can be evaluated using standard Semantic Web query processing and inferencing techniques. In order to focus on specific aspects and involved challenges, QALD comprises three tasks: multilingual question answering over RDF data, hybrid question answering over both RDF and free text data, and question answering over statistical data in RDF data cubes.
The main goal is to gain insights into the strengths and shortcomings of different approaches and into possible solutions for coping with the heterogeneous and distributed nature of Semantic Web data.
Target Audience
QALD targets all researchers and practitioners working on querying linked data, natural language processing for question answering, multilingual information retrieval and related topics.
Tasks:
Task 1: Multilingual Question Answering
Given the diversity of languages used on the web, there is an impeding need to facilitate multilingual access to semantic data. The core task of QALD is thus to retrieve answers from an RDF data repository given an information need expressed in a variety of natural languages. The underlying RDF dataset is DBpedia 2015. The training data consists of 350 questions available in eight different languages (English, Spanish, German, Italian, French, Dutch, Romanian, and Farsi). Those questions are general, open-domain factual questions and they vary with respect to their complexity. Each question is annotated with a manually specified SPARQL query and answers. The test dataset will consist of 100 similar questions. Training data: http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/6/data/qald-6-train-multilingual.json
Task 2: Hybrid Question Answering
A lot of information is still available only in textual form, both on the web and in the form of labels and abstracts in linked data sources. Therefore, approaches are needed that can not only deal with the specific character of structured data but also with finding information in several sources, processing both structured and unstructured information, and combining such gathered information into one answer. QALD therefore includes a task on hybrid question answering, asking systems to retrieve answers for questions that required the integration of data both from RDF and from textual sources. The task builds on DBpedia 2015 as RDF knowledge base, together with its abstracts and optionally English Wikipedia as textual data source. Training data comprises 50 English questions, annotated with answers as well as a pseudo query that indicates which information can be obtained from RDF data and which from free text. As test questions, we will provide 50 similar questions. Training data: http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/6/data/qald-6-train-hybrid.json
Task 3: Statistical Question Answering over RDF Data Cubes
As new task, QALD provides a benchmark focusing on multi-dimensional, statistical data comprising several datasets from LinkedSpending, which provides government spendings as linked data modeled according to the RDF data cube vocabulary. Question answering over this kind of data poses challenges that are different from general, open-domain question answering as represented by the above two tasks, with respect to both the structure of the data and the amount of aggregation necessary to answer information needs. The training question set consists of 100 questions compiled in the CubeQA project, annotated with SPARQL queries and answers. As test data, we will provide 50 additional questions. Training data: http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/6/data/qald-6-train-datacube.json
Evaluation
Participating systems will be evaluated with respect to precision and recall. Globally, the evaluation considers the macro and micro F-measure of a system, both over all test questions and over those questions that the system provided an answer for.
Answers generated by systems can be uploaded online: http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=evaltool&q=6
Paper submission
Papers have to be formatted according to the LNCS style requirements and submitted via EasyChair no later than March 11, 2016.
https://easychair.org/conferences/?conf=qald6
Please note that you can participate in the challenge without submitting a paper (and also submit a paper without participating in the challenge, although your paper should have some connection to QALD).
All accepted challenge papers will be published by Springer in the CCIS series. In addition, ESWC will publish a selection of the best challenge papers in the Satellite Event proceedings (a separate Springer LNCS
Volume) along with a selection of best workshop, poster and demo papers.
The best challenge paper authors may be asked to resubmit a somewhat extended version of their paper. Both the challenge proceedings and the Satellite Events proceedings will be compiled after the conference."
Important Dates
- Paper submission deadline: March 11, 2016
- Release of test data: April 8, 2016
- Deadline for submission of system answers: April 15, 2016
- Release of evaluation results: April 18, 2016
- Submission of camera-ready papers: April 24, 2016
Organizing Committee:
- Christina Unger
- Axel-Cyrille Ngonga Ngomo
- Elena Cabrio