My takeaway from this year’s ISWC
- Less is sufficient
- Theory and practice is happening more and more and it’s getting rewarded
- We need to think bigger
- Semantic and Knowledge Graph technologies are doing well in industry
So let’s start! This was a very exciting and busy ISWC for me.
For the past two years, Claudio Gutierrez and I have been researching the history of knowledge graphs (see http://knowledgegraph.today/). We culminated this work with a paper and a tutorial at ISWC. It was very well received:
I also gave the keynote “The socio-technical phenomena of data integration” at the Ontology Matching Workshop
Part of my message was to push the need of the Knowledge Scientist role.
I gave a talk on our in-use paper “A Pay-as-you-go Methodology to Design and Build Enterprise Knowledge Graphs from Relational Databases”
In conjunction with Dave Griffith, I also gave an industry presentation on how we are building a hybrid data cloud at data.world. Finally, I was also on an industry panel.
Oh, and data.world had a table
And our socks were a hit
Let’s not forget about Knowledge
Jerome Euzenat’s keynote was philosophical. His key message was that we have gotten too focused with data and we are forgetting knowledge and how knowledge evolves.
I agree with him. In the beginning of the semantic web, I would argue that the focus was on ontologies (i.e. knowledge). From the mid 2000s, the focus shifted to data (Linked Data, LOD) and that is where we have been. We should not forget about knowledge. And it’s because of this:
I would actually rephase Jerome’s message and say that we should not just forget about knowledge, but we should not forget about combining data and knowledge at scale.
A don’t forget:
Outrageous Idea
There was a track called Outrageous Idea and the outrageous issue was that most of the submissions were rejected because they weren’t considered outrageous by the reviewers. This lead to an interesting panel discussion.
The semantic web community has a track record of being outrageous:
- The idea of linking data on the web was crazy and many thought it would not happen.
- Even though SPARQL is not a predominant query language, one of the largest repositories of knowledge, Wikidata, is all in RDF and SPARQL.
- Querying the web as if it were a database was envisioned in the early 90s, and given the linked data on the web, it actually became possible (see Olaf Hartig PhD dissertation and all the work that was spawned. No wonder his 2009 paper received the 10 year prize this year. Congrats my friend!).
- Heck, the semantic web itself is an outrageous idea (that hasn’t yet been fulfilled).
However, there is a sentiment that this community is stuck and focused on incremental advances. Something needs to change. For example, we should have a venue/track to publish work that may lack a bit of scientific rigor because it is visionary, may not have a well defined research questions or clearly stated hypothesis (because we are dreaming!), or the evaluation is preliminary/lacking because it’s still not understood how to evaluate or what to compare to. Rumor has it that there will be some sort of a vision track next year. Let’s see!
Pragmatism in Science
It was great to see scientific contributions combining theory and implementation, thus being more pragmatic. A catalyst, in my opinion, was the Reproducibility Initiative. Several papers had a “Reproduced” tag to note that the results were implemented, the code was available and that a third party reproduced the results. One of the best research paper nominees, “Absorption-Based Query Answering for Expressive Description Logics” won the Best Reproducibility award. These researchers are well respected theoreticians, and it’s very interesting to see how they are interested in bridging their theory with practice.
The best research paper, “Validating SHACL constraints over a SPARQL endpoint” which is highly theoretical, also has experimental results and made their code available: SHACL2SPARQL
I’m seeing this trend also in the database community. For example, the Graph Query Language (GQL) for Property Graphs standardization process, will be accompanied by a definition of formal semantics, which is being led by theoreticians including Leonid Libkin.
I’m also starting to see this interest the other direction: researchers who focus more on building systems are being more rigorous with their theory and experiments. For example, the best student research paper was “RDF Explorer: A Visual SPARQL Query Builder” (see rdfexplorer.org). The computer science team partnered with an HCI researcher to make a user study and providing scientific rigor to their work (and ultimately getting nominated and winning an award).
Bottomline, it’s my perception that the theoritians want to make sure that their theory is actually used, and systems builders are focusing more and more on the science and not just the engineering. This is FANTASTIC!
Table to Knowledge Graph Matching
One of the big topics at the conference was the “Tabular Data to Knowledge Graph Matching” challenge. The challenge consisted of three tasks:
- CTA: Assigning a class (:Actor) from a Knowledge Graph to a column
- CEA: Matching a cell to an entity (:HarrisonFord) in the Knowledge Graph
- CPA: Assigning a property (:actedIn) from the Knowledge Graph to the relationship between two columns
The matching was to DBpedia. The summary of the challenge in one slide:
For example the team from USC, Tabularisi, at a high level, created candidate matches by using DBpedia Spotlight and used TF-IDF and that was sufficient to get decent results.
The winner of the challenge, Mtab, in my opinion, over-engineered their approach for DBpedia, which is how they were able to win the challenge.
DAGOBAH, from Orange Labs had two approaches. The first baseline that used DBpedia Spotlight and compared it against a sophisticated approach using embeddings. The embedding approach was slightly better, but more expensive.
There were other approaches such as CSV2KG and MantisTable:
My takeaway: “less is sufficient.” Seems like we can get sufficient quality by not being too sophisticated. In a way, this is good.
More notes
Olaf Hartig gave a keynote at Workshop on Querying and Benchmarking the Web of Data (QuWeDa). His message:
Albert Meroño presented work on modeling and querying lists in RDF Graphs ( Paper. Slides ). Really interesting
I really need to check out VLog, a new rule based reasoner on Knowledge Graphs VLog code. Java library based on the VLog rule engine Paper
SHACL and Validation
OSTRICH is an RDF triple store that allows multiple versions of a dataset to be stored and queried at the same time. Code. Slides.
Interesting to see how the Microsoft Academic Knowledge Graph as created in RDF. http://ma-graph.org/
An interesting dataset, FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation https://foodkg.github.io/index.html
Translating SPARQL to Spark SQL is getting more attention. Clever stuff in the poster: Exploiting Wide Property Tables Empowered by Inverse Properties for Efficient Distributed SPARQL Query Evaluation
There wasn’t so much material on Machine learning and embeddings, only one session (not surprising because I guess that type of work gets sent to Machine learning conferences). The couple of things I saw (not complete):
- Ampligraph.org, an “open source library based on TensorFlow that predicts links between concepts in a knowledge graph.”
- The poster Embedding OWL ontologies with OWL2Vec (code) was intriguing.
- The DAGOBAH (see Table to KG Matching) used embeddings for their approach.
- Pre-trained entity embeddings for all publications of the Microsoft Academic Graph using RDF2Vec
Need to check out http://ottr.xyz/
I missed the GraphQL tutorial.
Industry
Even though this is an academic/scientific conference, there was still a bit of industry attendees.
Dougal Watt (former IBM NZ Chief Technologist and founder of Meaningful Technology) gave a keynote, where he was preaching Dave McComb’s message of being data centric. I liked how he introduced the phrase “knowledge centric” which is where we should be heading.
Pinterest and Stanford won the best in-use paper award for “Use of OWL and Semantic Web Technologies at Pinterest”
Bosch presented their use case of combining semantics and NLP. They are creating a search engine for material scientist to find documents.
Google was present:
Joint work between Springer and KMi was presented
The Amazon Neptune presented a demo “Enabling an Enterprise Data Management Ecosystem using Change Data Capture with Amazon Neptune” and an industry talk “Transactional Guarantees for SPARQL Query Execution with Amazon Neptune”
I learned about Ampligraph.org from Accenture, an “Open source library based on TensorFlow that predicts links between concepts in a knowledge graph.”
Great to see Orange Labs participating in the table to knowledge graph matching challenge (more above).
Always great to connect with Peter Haase from Metaphacts and meet new folks like Jonas Almeida from NCI.
And that’s a wrap
ISWC is always a lot of fun. In addition to all the scientific and technical content, there is also a sense of community. I always enjoy being part of the mentoring lunch:
We had a fantastic gala dinner
And we even got all the hispanics together:
Take a look at other trip reports (I can now read them after I published mine!)
Avijit Thawani: https://medium.com/@avijitthawani/iswc-2019-new-zealand-bd15fe02d3d4
Sven Lieber: https://sven-lieber.org/en/2019/11/05/iswc-2019/
Cogan Shimizu: https://daselab.cs.ksu.edu/blog/is`wc-2019
Armin Haller: https://www.linkedin.com/pulse/knowledge-graphs-modelling-took-center-stage-iswc-2019-armin-haller/
With that, see you next year: