International Semantic Web Conference (ISWC) 2020 Trip Report

The International Semantic Web Conference (ISWC) is my “home” conference. I’ve been attending since 2008 (only missed 2010) and it’s the place where I reconnect with friends and colleagues and get to meet new people. It is always a highlight of my year. In ISWC 2017 was in Vienna, ISWC 2018 was in Monterey, California, ISWC 2019 was in Auckland, and this year we were supposed to be in Athens! Oh COVID!
My takeaways:

  • Realization that we need to understand users!
  • Are we educating enough the new generation of computer scientist? No, they need to learn about knowledge engineering!
  • Creative RDF Graph Data Management
  • Data, data data
  • Of course… embeddings, neuro-symbolic and explainable AI were hot topics
  • This is an eclectic community!

Users, users, users

My current research interest is on understanding the social-technical phenomena of data integration. Therefore my eyes and ears were focused on topics about users. I know I’m biased here, but for me, one of the strongest topics at ISWC this year was about users.

It all started with AnHai Doan‘s keynote at the Ontology Matching workshop. The main takeaway: evaluation is not about how much you can automate, instead on how much user productivity increases.

This was music to my ears! In previous conversations I’ve had with AnHai, I was happily surprised to know that we were both tackling problems in similar ways: let’s break down the problem into several steps, let’s figure out how to solve it manually, that becomes the baseline and from there we can improve. AnHai has been focusing on entity linking, while I have been focusing on schema/ontology matching. Many lessons learned from AnHai’s experience (his startup was acquired by Informatica earlier this year):

Users came up on the topic of ontology/schema matching:

I had a “hallway” chat with Catia Pesquita and she mentioned the need of a “usefulness metric” which I think is spot on. In the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching. I mentioned the following in our slack discussion:

Food for thought: how about some sort of User Metric that measures the productivity of a user. For example, if a System A has lower precision than System B but it takes a lot of effort to setup/maintain/… System B, then maybe I would prefer System A. This is just an example. I’m brining this up because I’m seeing a trend throughout the conference about the realization to understand more how Users are involved.

The community is in agreement that we need to expand the SemTab challenge to take in account users. I’m excited to be part of the organizing committee of this challenge for next year.

Cassia Trojahn presented a paper “Generating Expressive Correspondences: An Approach Based on User Knowledge Needs and A-Box Relation Discovery” which tackles two things that I’m interested in: complex matches (the real world is not about simple 1-1 schema matches) and users’ needs. Unfortunately, there was no user evaluation. This is the next step that we need to take as a community. Evaluate the cost of having users and how we can decrease that cost.

Larry Hunter‘s keynote  was on using semantic technologies in life science. I was fascinated by the honest discussion on how to integrate the data: paying experts to manually verify. Larry clearly acknowledged that we need genuine experts to validate the mappings. In his case, the experts are doctors and they are expensive. Therefore, he needed a budget in order to hire people to do the expert mappings. And this requires time. We lack methodologies to do this in a systematic way. While listening to his presentation, all I could think about was the need to focus on user productivity.

Users came up from a developer standpoint:  One thing that I cringe is when I read or hear people making claims about users or developers without any evidence supporting these claims. The work of LD-Flex and OBA is tackling how to make RDF and Linked Data more accessible to developers. During the discussion of these papers, Ruben Taelman and Daniel Garijo made statements such as “developers like X, they prefer Y, etc”. These are anecdotes. Funny thing is that the anecdotes of the LD-Flex and OBA authors contradicted each other. My takeaway: it’s still not clear what developers would prefer.

Miriam Fernandez gave a splendid vision talk. She questioned the following: “the web we have created… the view is of the creators… is this really a shared conceptualization? How much of the knowledge we have created that is on the web contains alternative facts?” and asked everyone:

In Peter Fox‘s vision talk, he asked “Are we educating enough of the new generation? ” IMO, we are not (see next section).  He also reminded us about humans in the loop, which is a topic that is gaining a lot of traction in so many other fields of Computer Science.

My overall takeaway here is that  by looking at traditional problems and their incremental solutions from a social-technical perspective, we can make the science much more interesting. We need to define evaluation metrics for users. As I mentioned in a slack discussion:

As a community, we need to push ourselves into the uncomfortable position of doing user evaluations. Is this hard? OF COURSE!! but heck, that’s what makes it interesting. Life can’t be that easy

Knowledge Science (a.k.a Knowledge Engineering 2.0)

Another interest I have is understand how to bridge the gap between the data producers and data consumers. I’ve argued for the need of Knowledge Scientist and Data Product Managers  (listen to our podcast episode on this topic) to fill that gap.

Oh was I happy to see this topic as Elena Simperl‘s vision. She asked: “What do we know about the technical, scientific and social aspects involved in the building, maintaining and using knowledge based systems? ” This community has a lot to say because many  come from the Knowledge Acquisition community. A lot of open and hard questions are still open: How do we capture common senses knowledge (there was a tutorial on that), culture and diversity?  There are many HARD questions that we need to ask ourselves about modeling knowledge that are outside of our comfort box (how do we model negative facts or ignorance?). From a tooling perspective, where is the equivalent of jupyter notebooks for knowledge modeling (Gra.fo is a step in that direction that combines knowledge modeling with collaboration) ? Elena stated that the next wave of AI will not succeed unless we study these hard questions, and I fully agree. If we do not understand the knowledge about our data, it is going to continue to be garbage in, garbage out. Similarly, Peter Fox asked: “Are we educating enough of the new generation?” IMO, we are not! My takeaway: we need to teach knowledge engineering to the next generation of students and we need to research (again) knowledge engineering taking in account the data discipline of the 20th century (knowledge engineering was popular in the 1990s),  and interdisciplinary methods. Additionally, we need to work with other communities. 

Together with Cogan Shimizu, Rafael Gonçalves, and Ruben Verborgh, we organized the PRAXIS workshop: The Semantic Web in Practice: Tools and Pedagogy. Our goal was to have a WORKshop and gather a community focused on the collection, development, and curation of pedagogical best practices, and the tools that support them, for the Semantic Web and Knowledge Graph communities.

We had a successful event discussing the need of creating a syllabus for a bachelors course on Knowledge Engineering/Science. We acknowledge that courses in a masters program is already too late. We are going to start cataloging existing semantic web, knowledge graph, knowledge engineering courses. Stay tuned because we will need your help!

Overall, I believe we are realizing that we need to reinvent the role of Knowledge Engineering for the 2020s: Knowledge Science (a.k.a Knowledge Engineering 2.0)

Creative RDF Graph Data Management

Around a decade ago, there was a lot of RDF data management work at ISWC that resembled work that could have been published at a database conference. Largely, that type of work has gone away. This year I was happily surprised to see this topic come back with novel and creative approaches.

Tentris is a tensor-based triple store where an RDF graph is mapped to a tensor and SPARQL queries are mapped to Einstein summations and leverages worst case optimal multi join algorithms. Juan Reutter gave the keynote AGM bound and worst-case optimal joins + Semantic Web a 4th Workshop on Storing, Querying, and Benchmarking the Web of Data. AGM bound is “one of the most important results in DB theory this century” which has led to the rise of “worst-case optimal” join algorithms. This is a very popular topic in the database community and the semantic web community should look into. Trident is a new graph storage engine for very large Knowledge Graphs with the goal of improving scalability while maintaining generality , support for reasoning, and runs on cheap hardware.

I was very excited to see the work on HDTCat, a tool to merge the contents of two HDT files with low memory footprint. RDF HDT is a compressed format for RDF (basically the parquet files for RDF). This is a problem we encountered at data.world a while back. Every dataset ingested in data.world is a named graph represented in RDF-HDT so when running a SPARQL query over multiple named graphs, we encountered this issue when the data was large: it just used too much memory. It was very nice to see that the solution presented in HDTCat is similar to what we did at data.world to solve this problem.

I enjoy seeing how the community is looking at how to extend SPARQL in many different ways:

Turing complete to support graph analytics (i.e. page rank)

With similarity joins

combining Graph data with Raster and Vector data (The GeoSPARQL+ paper was a best student paper winner) and studying SPARQL query logs and user sessions to understanding user behavior.

Data, data, data

The vision talks speakers had some fantastic insights about data.

Barend Monds reemphasized the need for FAIR data and how we should keep metadata and data separate. For many  applications, you need to consume only metadata initially.  Barend made a bold and strong statement: Invest 5% of research funds in ensuring data are reusable. Jeni Tennison, gave a heartfelt message: don’t use data for negative aspects of life, data and access to data is political, access to data should be the norm and we need a world where data works for everyone. Fabien Gandon reminded us that the web connects ALL things.  Stefan Decker made an important call to take persistent identifiers seriously:

This seems like a small issue but it is CRUCIAL. If we were to think about persistent identifiers correctly from the beginning, I postulate the many of the data integration problems we suffer would go away.

Oh, and I  believe Peter Fox coined the term semantilicious. Is your data semantilicious?

Of Course…

Of course the expected hot topics were present (take a look at the list of accepted papers).

Of course …there was a lot of work presented about embeddings!

Of course … the combination of neuro-symobolic approaches was a hot topic. This was in Uli Sattler‘s vision. Take a look at the Common Sense Knowledge Graph tutorial.

Of course… Explainable AI was a topic. In particular I appreciated Helena Deus‘ vision of incorporating the bias in the model such that the model could be avoided if it’s not applicable (if the model is trained on lung images, don’t use it on brain).

More Notes

We had a lot of great social events: Ask Me Anything with Craig Knoblock, Jim Hendler, Natasha Noy, Elena Simperl, Mayank Kejriwal. We also had meetups: Women in Semantic Web and Semantic Web Research in Asia. The Remo platform worked very well for “hallway” conversations.

The vision talks and sister conference presentations are awesome. Please keep that!

In an AMA conversation with Jim, he shared Tim Berners-Lee pitch for the semantic web: my document can point to your document, but my spreadsheet can’t point to yours. In other words, my data can’t point to your data

Need to take a look at “G2GML: Graph to Graph Mapping Language for Bridging RDF and Property Graphshttp://g2gml.fun/

Need to take a look at “FunMap: Efficient Execution of Functional Mappings for Scaled-Up Knowledge Graph Creationhttps://github.com/SDM-TIB/FunMap

Need to take a look at “Tab2Know: Building a Knowledge Base from Scientific Tables

From what I heard, the Wikidata workshop was a huge hit.

My friends at UPM gave a Knowledge Graph Construction tutorial. I believe this topic has a lot of interesting scientific challenges when users come into play. A lot of opportunities here!

Chris Mungall gave an interesting keynote at Ontology Design Patterns workshops on how to use design patterns to align ontologies in the life science. What I appreciated about his talk is the practicality of his work. He is taking the theory into practice.

How do you represent negative facts in a knowledge graph? See “Enriching Knowledge Bases with Interesting Negative Statements“. Larry Hunter brought up something related in his keynote: how do you represent ignorance?

How can we make RDF/Linked Data/Knowledge Graphs friendl for developers. See SPARQL Endpoints and Web API Tutorial, OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs:

And LDFlex:

I realized that I didn’t spend time at industry/in-use talks (Last year I spent most of my time at in-use/industry talks). Need to review those papers.

Something new to add to my history of knowledge graph work:

Kavitha Srinivas gave a cool keynote on knowledge graph for code

RDF* always seems to come up

Oh, and in case you need some ideas:

My Daily takeaways

Congrats to all the winners

My main takeaway: this is an eclectic community!

The semantic web community is truly an eclectic community. In this conference you can see work and talk to people about Artificial Intelligence, Knowledge Representation and Reasoning, Ontology Engineering, Machine Learning, Explainable AI, Database, Data Integration, Graphs, Data Mining, Data Visualization, Human Computer/Data Interaction, Streaming Data, Open Data, Programming Languages, Question Answering, NLP and of course, the Web! Therefore if you feel that you don’t fully fit in your research community because you are dabble in other areas, the semantic web community may be the place for you!

This is also a diverse community

I’m very proud to be part of the community and to consider it home! I miss hanging out with all my friends, having inspiring conversations, dancing, eating, and making new friends.

Massive THANK YOU to the entire organizing committee to make this an amazing virtual event, specially the general chair Lalana Kagal!

Hopefully “see” you next year in Albany, NY!