Bonus Challenge

Schema.org - Bonus Challenge

Schema.org is a project established in 2011 by Google, Microsoft/Bing, Yahoo and Yandex focussed on the wide-scale deployment of structured data in the Web.
It defines a large collection of schemas using a simple variation of the W3C RDFS language, and is typically published within existing HTML content using Microdata, JSON-LD and RDFa. Millions of sites now publish this markup, primarily for its use in Web search; however the markup is also used in non-public settings such as email. Schema.org's data model in practice corresponds roughly with that of "generalized RDF" (http://www.w3.org/TR/rdf11-concepts/#section-generalized-rdf), although it can be deployed in more strictly RDF-oriented settings.

The purpose of this "additional challenge" is to explore ways of closing the gap between Schema.org and other tools, vocabularies and datasets of the Linked Data and Semantic Web community.  First we offer a brief background on schema.org as it relates to Semantic Web efforts, then we outline how this relates to this year's set challenges.

 

Background

Schema.org's approach was designed to encourage mass adoption through simplification. It has had significant success by making slightly different choices:

Fundamentally, schema.org prioritizes ease of publication over ease of consumption. This shows itself in several ways.

Publishers/Webmasters are rarely interested to pick and choose from diverse and disconnected collections of independent schemas. The schema.org approach was to create a single unified collection of schema definitions that are more carefully integrated, and to expose this as a single evolving vocabulary/namespace for publishers to adopt.

This approach also informs schema.org's approach to identifiers. It does not expect publishers to carefully link to precisely correct URIs for entities that may be further described elsewhere in the Web. Instead, "reference by description" is encouraged: publishers can write markup whose form is "the Person whose name is ____" or "the airport whose code is ____" rather than requiring that URIs referencing well-formed RDF descriptions be needed.  Entity disambiguation can (optionally) be aided through use of schema.org's 'sameAs' property, which references well known URLs whose primary topic is the thing being described.

Syntactically, schema.org has emphasized markup in which structured data lives within existing "mainstream Web" pages, such as Microdata, RDFa and JSON-LD-within-email. This reflects both its origins in Web search, but also a concern that URIs/URLs, if they are to be widely shared and linked, ought to be useful to humans as well as to RDF parsers.

Formally, schema.org has not focussed on rule-oriented knowledge representation. Instead it has emphasized textual definitions and a large collection of illustrative examples. Its definitions often evolve over time, as new vocabularies are added, and as opportunities for clarification, simplification and integration are identified.

See also http://queue.acm.org/detail.cfm?id=2857276 for some background on the project including observations about the relationship to Linked Data.

 

Challenge

Rather than create a separate schema.org challenge, we encourage where appropriate submissions to other ESWC2016 challenges to consider also exploring schema.org's relationship with Linked Data and Semantic Web tools, technologies, vocabularies and datasets. We do not solicit proposals that "fix" schema.org by having it adopt the rules and conventions of Linked Data, OWL etc. Nor do we encourage projects to simply replace their use of  non-schema.org vocabularies with schema.org, purely in pursuit of prizes. To ensure this, our prizes will not be very valuable! Instead we will reward thoughtful, innovative and practical submissions that make interesting steps towards bringing together the best of these different but related approaches to structured data on the Web.

Topics of concern (an indicative rather than exclusive list)

  • identify, explore and illustrate opportunities for RDF-oriented tooling to do interesting and useful things with schema.org-based data.
  • showcase ways of using RDF datasets alongside, or as extensions to, schema.org data. For example, how could DBpedia's or Wikidata's vocabularies be used to enrich schema.org markup around topics such as sports where a "long tail" of terms and definitions makes it impossible for schema.org itself to cover everything? can such large sets of terms be exposed as schema.org extensions (see http://schema.org/docs/extension.html) in a manner that continues to be accessible to mainstream publishers and webmasters?
  • improvements to various levels of RDF tooling (parsers, databases, apps) in particular opensource software that makes schema.org data (via Microdata, RDFa 1.1, JSON-LD within HTML)  easier to access within RDF/SPARQL tools. Consider for example what a site already publishing schema.org would need to do to load that data into a SPARQL database: can such a process be radically simplified? Can such a database be made useful for publishers that have no prior interest in or experience with Semantic Web or RDF oriented tooling? How do "Cloud" services fit in this picture?
  • Visualization and analytics: can the tools, techniques and standards of the Semantic Web be used to help publishers understand the schema.org structured data they are publishing, or help potential consumers understand the datasets that are available? Between 0 and 8 (modest) prizes will be awarded to Challenge submissions in cases where the submission is judged as making a particular contribution to bridging Schema.org with Semantic Web and Linked Data along the lines outlined above. The guiding concern in judging will be to consider how the submission could fit into the existing world of everyday publishers on the Web: "what difference could this submission make to any of the millions of sites currently publishing schema.org data?".

In the case that we are overwhelmed with excellent and relevant submissions, 8 is not a hard upper limit to our set of modest prizes. This is likely to be a list of (topically relevant) books from which winners will be invited to pick.

 

Submission process

Send an URL of a public page to (questions and requests for clarification etc. to the same address). Please also send a copy of any related paper submitted to the more formal ESWC challenges.

 

Judging

This 'bonus challenge' will be judged by the W3C Schema.org Community Group chair (Dan Brickley), in consultation with the members of schema.org's Steering Group (see http://schema.org/docs/about.html). Members of the Steering Group who choose to participate in challenges will not be involved in the judging.  During judging, members of the Steering Group are expected to declare any collaborations, affiliations or other factors that could be interpreted as biasing their view. Schema.org SG members may decline to participate in the judging of the Challenge. Public discussions in the Schema.org W3C Community Group (CG) will also be taken into account. CG membersare encouraged to participate in challenges.

Organizing Committee:

  • Dan Brickley