Our partners Ontotext talked with Gene Loh, Director Software Development at Synaptica, and Vassil Momtchev, Ontotext CTO, about the RDF-star extension to the RDF graph data model, its value and how it was implemented in Ontotext’s GraphDB and used in Synaptica’s Graphite.
Technically speaking, RDF-star is the syntactic sugar, which makes it easier to attach metadata to edges in the graph. Or, in other words, to make a statement about another statement. This was already possible in the very first RDF 1.0 specification using the reification mechanism. Sadly, the reification had a bad connotation of being overly slow due to the increased number of additional statements needed to identify the reference triple and too verbose to use in SPARQL:
The authors of RDF-star proposed a new short syntax and because of its elegance, GraphDB optimized its persistence to nearly double the loading speed for dataset with lots of statement-level metadata. The feature immediately received extreme interest also from ontology modelers who struggled to express complex relationships in a short and concise way.
What is the significance of RDF-star?
Vassil Momtchev: The practical significance of RDF-star is that it increases the modeling expressivity with a new RDF resource type – embedded triple, which works as a pointer to an RDF statement. This also fully matches the theoretical expressivity of the property graph (PG) model without the need to use reification i.e., an abstract construct with the existing specific methods supported by the language.
Now every PG can be efficiently represented as an RDF model. The opposite direction is not true, because RDF is more expressive in various ways. Particularly, with RDF-star one can attach arbitrarily complex descriptions to an edge in the graph, while in PG one attach only key-value pairs.
How did Ontotext go about implementing RDF-star in GraphDB?
Vassil Momtchev: In the past, we have struggled with the same modeling challenge. More than 10 years ago we even had experimental support of what we called triplesets, which allowed the association of metadata to sets of statements. The major challenge with our internal implementation was that it was not standard and it was not exposed through SPARQL, but a specific API. What triggered us to implement RDF-star was an RDF community group meeting, where multiple users and vendors expressed their desire to standardize it.
What were the challenges and how did you overcome them?
Vassil Momtchev: The major challenge in any new standard, introducing a data model with formal semantics (a computational model that can be represented as a sequence of mathematical calculations), is to cover all corner cases. Yet, not all vendors fully agree on whether we should support nesting of embedded triples, wildcards in specific positions during querying or if it should be allowed to reference a non-existing triple.
The main guideline in implementing this feature was to stay fully compliant with all existing standards like the RDF 1.1 specification. We expanded the work of Olaf Hartig, the scientist who proposed RDF-star, to support referencing non-existing triples, because this was already supported by the RDF reification and deleting a triple would not retract all its metadata automatically. Probably, the most important use case we had in mind was the support of meta-data for deleted statements.
We believe that we have the most comprehensive RDF-star implementation because we are fully compliant with all pre-existing standards.
Why was RDF-star adopted by the Graphite team as a solution for managing complex Access Control Lists (ACLs)?
Gene Loh: The ACL case fits exactly in scenarios where metadata needs to be associated with a given statement. Graphite provides users the ability to define access permissions at the property level. One of the challenges faced by our design team when attempting to extend the Graphite permissions data model pertained to the limitations of native RDF triple constructs, so that extending beyond three tuples became a cumbersome exercise.
For example, User A has edit permissions to Property B for the concepts in Scheme C in Project D, requires a tuple with a minimum of five elements for semantic expression of relationships in the dataset.
How did RDF-star help you overcome the challenges with property-level permissions?
Gene Loh: In the context of the Graphite permissions data mode, the workaround of adding RDF triples in a conventional setting was inefficient and greatly reduced human readability of the data structures and SPARQL queries.
After consolidating the new user and group access control and permissions model in Graphite, there were no issues remapping functionalities from the old model to the new. It is a testament to the design of RDF-star and SPARQL-Star that migrating existing data from the previous RDF model to RDF-star can be performed in a straight forward series of SPARQL-Star statements.
In addition, the syntax of embedded triples is intuitive, which shortened the learning curve for the Graphite developers. Beyond its use in the Graphite model, RDF-star would invariably simplify the representation of ontology structures like SKOS-XL. While the schematic representation in the data model is relatively straight forward, there are unique challenges in designing a user interface that is sufficiently intuitive for data entry and editing.
How did Ontotext and Synaptica engineering teams work together?
Gene Loh: In the quest to solve the problem of efficient representation of n-tuple data, Synaptica considered using a quad with named graphs, but this still constituted a workaround.
Coincidentally, Synaptica was introduced to Ontotext’s RDF-star functionality at an opportune time. We quickly realized that the GraphDB RDF-star and SPARQL-star implementation with rdf4j could be a solution to the n-tuple problem and began our investigation and proof-of-concept on an initial design. The results were promising.
Our resulting architecture includes nested embedded triples, which is the cornerstone of a much cleaner n-tuple structure while retaining requisite interconnectivity between data elements of each tuple.
Vassil Momtchev: We were very pleased to have a strong technical team, like the one of Synaptica, to test drive our RDF-star implementation on a serious task and provide feedback. This helped us iron out some glitches.
What are the benefits of the RDF-star solution?
Vassil Momtchev: The Ontotext team did extensive research on the various options available to model data. With the help of Daniel Hernandez and the other authors of the paper Reifying RDF: What works well with Wikidata? We did practical modeling of various approaches and measured their performance.
The results for a given Wikidata dataset are very conclusive and do not need further explanation as to why RDF-star is a superior approach to modeling RDF statements associated with complex metadata.
What is Ontotext’s vision for RDF-star going forwards?
Vassil Momtchev: We look forward to seeing how W3C will standardize the RDF-star specification and integrate it as part of RDF 1.2. This will be a huge success for all users who will benefit from the evaluation of the RDF data model incorporating a huge number of use cases and scientific efforts to formalize how we should efficiently represent complex knowledge.
What is Synaptica’s vision for RDF-star going forwards?
Gene Loh: Reification is where RDF-star would bring about major architectural improvements in Graphite; from annotating predicates, to provenance and audit logs. Prior to considering RDF-star, a property graph model would have been difficult to model and implement since native RDF reification is unwieldy.
It is our hope that RDF-star is widely adopted and will be the definitive standard. The Graphite knowledge graph has already taken on a new dimension.
Vassil Momtchev
CTO, Ontotext
Vassil has more than 15 years in software development in various domains like life sciences, pharmaceutical, health care and telecommunication. In the past 10 years he’s mostly engaged with the development of complex enterprise knowledge management solutions that features natural language processing, text analytics, reasoning, semantics, ontology design, linked data, conceptual model design, implementation of formal grammars and graph databases
Gene Loh
Director Software Development, Synaptica
Gene has over 25 year’s experience in software development and leads the Synaptica engineering team focused on research and development. Gene is also founder and Executive Director of Stronium, based in Kuala Lumpur. He has keen interest in open systems and the adoption of rapidly advancing technologies in building applications for knowledge management. Gene has a degree from the University of Western Australia with majors in Computer Science and Information Technology.
Resources:
Reifying RDF: What works well with Wikidata?