It’s always an honor to be invited to a Dagstuhl Seminar (this was my third!). I was extremely lucky to have been invited to a seminar on Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web . The goal of Dagstuhl seminars is to get a group of really smart people together, drink beer and wine, and discuss about the future of a particular research area.
It’s @dagstuhl time! Spending the week talking about the future of Knowledge Graphs pic.twitter.com/XvGahvUZHC
— Juan Sequeda (@juansequeda) September 9, 2018
I went to this seminar with the following agenda: 1) share my new research interest on studying social-technical phenomena of data integration and see what others think, 2) learn what others think are the hard scientific challenges with Knowledge Graphs and most importantly 3) understand what is the big AHA moment with Knowledge Graphs. After spending the last few days processing the entire week, I feel that I achieved all the goals of my agenda.
Given that I wear two hats, industry and academia, let me summarize my takeaways w.r.t these two hats
Industry Hat
What is a Knowledge Graph?: There was no consensus from the beginning. Claudio Gutierrez presented historical definitions knowledge graph. I had no idea that the word Knowledge Graph shows up in the PhD dissertation of René Ronald Bakker… in 1987!
Throughout the week, there was a philosophical and political discussion about the definition. Some academics wanted to come up with a precise definition. Others wanted it to be loose. Academics will inevitably come up with different (conflicting) definitions (they already have) but given that industry uptake of Knowledge Graphs is already taking place, we shouldn’t do anything that hinder the current industry uptake. For example, we don’t want people searching for “Knowledge Graphs” and finding a bunch of papers, problems, etc instead of real world solutions (this is what happened with Semantic Web). A definition should be open and inclusive. A definition I liked from Aidan Hogan and Antoine Zimmermann was “as a graph of data with the intention to encode knowledge”
FAIR Data and Best Practices: I believe that the notion of Findable Accessible Interoperable Reusable (FAIR) is brilliant! This doesn’t apply just for life science or research data. Every enterprise should strive to have FAIR data. Within life science, Barend Mons is pushing that the best practice to implement FAIR data is to create Knowledge Graphs in RDF. I agree because you want to share and reuse vocabularies, have unique identifiers and this is the core of RDF. However, we are missing a set of Best Practices; not just on how to create FAIR data but how to create, manage, maintain Knowledge Graphs in general.
Wikidata: I enjoyed chatting with Lydia Pintscher and learning how Wikidata really works. It is an interesting mix of tech and community. For example, users can create classes and instances but cannot create properties. If they believe they need to do it, it has to go through a community process.
Knowledge Graphs in the real world: We have all been hearing about the popular ones gaining press recently (Google, Amazon, Uber, Airbnb, Thomson Reuters) but I started to hear about others that I wasn’t aware about such as Siemens, Elsevier, Zalando, ING, etc. We need to start compiling a list of them. It was also great to hear from Dezhao Song on how the Thomson Reuters Knowledge Graph was created (spoiler alert: it’s a typical NLP pipeline). For more info, check out their paper “Building and Querying an Enterprise Knowledge Graph“.
Research Hat
The first two days were spent on discussing interest that we had in common and the challenges. I believe there were a total of 15 discussions going on. The topics were on: NLP, Graph Analytics, ML, Decentralized Web, Reasoning and Semantics, Constrained Access, DBpedia/Wikidata, Human and Social Factors, Data Integration, Evolution, What is a KG, Best Practices and more. There will be an official report on all these discussions so stay tuned!
My AHA Moment: Tuesday evening at the wine cellar we had a late (late) night discussion and some of us believed that the discussion up to now were definitely interesting but could be considered the natural, incremental, next steps. There was a lack of boldness and thinking outside of the box. What are the true grand challenges? Our late night discussion helped drive the atmosphere on Wednesday in order to focus on being bold.
I was extremely lucky to participate in a group discussion on Wednesday with Piero Bonatti, Frank van Harmelen, Valentina Presutti, Heiko Paulheim, Maria Esther Vidal, Sarven Capadisli, Roberto Navigli and Gerald de Melo. We started asking ourselves a philosophical question: What is the object of our studies? This question sparked fascinating discussions. I felt like a philosopher because we were questioning what we, as scientist, were actually researching. We are making observations about which natural phenomena? We are devising theories about what object? I’m happy to say that we did come to a conclusion.
In my own words: Knowledge Representation & Reasoning and Knowledge Engineering are fields that study the object of knowledge. Data management is a field that studies the object of data. Each of these fields have had independently advances on understanding this object and how it scales (where scales means the typical Vs: volume, variety, etc). Furthermore, efforts to study the relationship between these objects can be traced back to 1960s. However, what we observe now is a new object: knowledge and data at scale. I would like to be bold and state that studying the phenomena of knowledge and data at scale is it’s own field in Computer Science. Knowledge Graphs are a manifestation of this phenomena.
Attempting to shape the future of knowledge at @dagstuhl with @FrankVanHarmele @RNavigli @vpresutti @juansequeda pic.twitter.com/GZWAHRTE8h
— Heiko Paulheim (@heikopaulheim) September 12, 2018
This was my AHA moment. After we shared this with the rest of the group, it sparked a lot of discussion, including the political and philosophical aspects of defining a Knowledge Graph.
Human and Social factors: Succinctly, my current research interest is in researching the socio-technical phenomena the occurs in data integration. With my industry hat on, I get to work with large enterprises on data integration. Throughout this work I observe the work we do with my research lenses and observe situations (phenomena) that I do not know how to deal with. For example, creating mappings is a not a problem that can be addressed easily with a technical/software solution. Schemas are extremely large and complex and you need specific knowledge (i.e. legal, business) that only specific users know, and they may not even agree. I had many interesting discussions with Valentina Pressutti, Paul Groth, Marta Sabou, Elena Simperl and I’m glad to realize that my research interest has merit and is shared with others. There is a lot of work to be done, specially because we, as computer scientist, need to interact with other communities such as cognitive science, HCI, etc. I’m very excited about this topic because it gets my out of my comfort zone.
Multilinguality: I have never followed this topic so it was great to learn about it. We were lucky to have Roberto Navigli part of the crew. I understood the true challenge of multilinguality of knowledge graphs when Roberto talked about the cultural aspects. For example, how do you define Ikigai in a Knowledge Graph and how do you link it to another entities?
Evolution: how do you represent evolutions of a Knowledge Graph? How do you reason with the evolutions?
@juansequeda report working group on #KnowledgeGraphs #evolution @dagstuhl pic.twitter.com/KLYjfNmCIu
— Valentina Presutti (@vpresutti) September 10, 2018
I’m very grateful to be part of this research community and I look forward to all the outcomes. Exciting times!
Fantastic discussions going on at the @dagstuhl seminar: Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Webhttps://t.co/lF3WzXqMEZ pic.twitter.com/bX69SBlrbc
— Juan Sequeda (@juansequeda) September 12, 2018
Some random stuff
I wrote this report without looking at Eva Blomqvist ’s and Paul Groth’s trip reports. I didn’t want to bias mine. Now I can read them.
A comment from Frank: It shouldn’t be called Computer Science. It should be called Computing Science because it’s a science that studies the phenomena of computing (not computers because that is the tool), the same way Astronomers don’t study telescopes, they use them as a tool.
I spent an extra day at Dagstuhl. It was nice to relax and reflect on the week. Additionally I spent almost 4 hours in the library and found a lot of gems: First volumes of JACM and a lot of cool books signed by the authors, including Edsger W. Dijkstra. I also found the book of my PhD dissertation in the library
What an amazing feeling to find the book of my PhD dissertation at the @dagstuhl Library. Also found the proceedings of ISWC 2017 where I appear as an editor.
Let’s see how this picture will be in 20 years 🙂 pic.twitter.com/0QAfqYATMq
— Juan Sequeda (@juansequeda) September 15, 2018
I believe there was a consensus that the best wine at Dagstuhl was Château de Caraguilhes Prestige Corbières 2015.
This is a community with amazing scientist but also amazing musicians! We were lucky that Dagstuhl has a music room with a grand piano and guitars. Can’t wait for the ISWC Jam Session