Skip to main content

Bob Kasenchak and Ahren Lehnert, Synaptica

Part three of the Stanford Knowledge Graphs course featured some examples of advanced graphs and graph applications. Video of this session is available here.

What are some advanced Knowledge Graphs?

Mike Tung, CEO, Diffbot

The architecture of the Diffbot Knowledge Graph

Tung’s talk about the Diffbot knowledge graph focused on their efforts to “Build the World’s First Comprehensive Map of Human Knowledge” based on information extracted from the Web.

The Diffbot Knowledge Graph was launched in 2018 and crawls over 60 billion pages on the web — it takes about four days to do a crawl-and-construct operation — and has 10+ billion entities and over 1 trillion triples (“facts”). About 150 million additional entities are added each month.

The result is a “production-level quality service” accessible via a visual dashboard or APIs for integrations with other business tools. A fully automated process, the Diffbot KG is built using page type classification, visual extraction of information, an NLP layer to help extract information from natural language content, and resolution of entities found from various sources.

This graph of staggering proportions includes an interface for data cleanup that helps Diffbot develop high-quality facts mapped to a core ontology and looks very impressive.

—————————————-

Cogan Shimizu, Kansas State University

Modular Ontology Engineering

Shimizu’s talk discussed the benefits of having a well-designed schema for knowledge graphs, as he is concerned about adaptation and reuse (among other things), accompanied by human-readable documentation including design decisions.

Shimizu described his interest in the “praxis of Knowledge Graphs” including how are they used, tools for developing them, reusability, useful semantics, structural and usage patterns, and automatic graph construction and population. These concerns are based around the questions “how can we develop a highly reusable knowledge graph?” and “how can I take my knowledge and experience and add it to YOUR knowledge graph?”

He discussed how knowledge graphs are generally “found to be difficult to be reuse” as they lack good schemas and documentation and are therefore difficult to adapt to new use cases. Shimizu consistently compared this issue to the same problem in programming (understanding someone else’s code without good documentation).

His approach to solving this problem adopts a concept from data-sharing principles called FAIR:

Findable

Accessible

Interoperable

Reusable

Again we saw emphasis on using ontology as schema for knowledge graphs. Shimizu’s ideas about “modular ontologies” is concerned with uses/reuses for schema as needs change and that modular ontologies form a natural bridge between human conceptualization and data.

—————————————-

Marie-Laure Mugnier, University of Montpellier

Reasoning on Data with Existential Rules

We won’t post screenshots of talks here as we think that it’s not our intellectual property to share; however, the diagrams in this talk were very helpful (so watch the video if you’re interested!).

Mugnier’s talk took the approach of translating knowledge graphs  into first-order logic to reason about the existence of nodes that are implied but not explicit. Using logical set-theory notation, she is interested in answering questions based on complex relationships between objects that is not expressible in RDF but are, perhaps, in a query language based on logical notation.

Using logic rules to define new predicates and inferring the existence of implied nodes, Mugnier demonstrated how to answer questions about relationships not directly query-able from the graph adding knowledge (nodes, edges) to this graph to answer a question.

Again, this is difficult to describe without recourse to the visual aids, but the essence is that new facts are inferred based on RDF-external logical rules added as predicates to the RDF graph to add information via inference (not RDF) to make new triples/facts to answer questions.

For example, we can infer the existence of unknown entities (or predicates):  if A and B are siblings there must be a common parent (node), which even if unknown can be inferred.

 

As the session title implies, the work being done in the field for creating advanced knowledge graphs is sophisticated and has the potential to solve real-world business problems.

You can read the full series of blogs on the Stanford University Knowledge Graphs course here. You can also follow Bob and Ahren on Twitter.

Bob Kasenchak @TaxoBob

Ahren Lehnert @AhrenLehnert