November 2020 - Juan Sequeda's Blog

The International Semantic Web Conference (ISWC) is my “home” conference. I’ve been attending since 2008 (only missed 2010) and it’s the place where I reconnect with friends and colleagues and get to meet new people. It is always a highlight of my year. In ISWC 2017 was in Vienna, ISWC 2018 was in Monterey, California, ISWC 2019 was in Auckland, and this year we were supposed to be in Athens! Oh COVID!
My takeaways:

Realization that we need to understand users!
Are we educating enough the new generation of computer scientist? No, they need to learn about knowledge engineering!
Creative RDF Graph Data Management
Data, data data
Of course… embeddings, neuro-symbolic and explainable AI were hot topics
This is an eclectic community!

Users, users, users

My current research interest is on understanding the social-technical phenomena of data integration. Therefore my eyes and ears were focused on topics about users. I know I’m biased here, but for me, one of the strongest topics at ISWC this year was about users.

It all started with AnHai Doan‘s keynote at the Ontology Matching workshop. The main takeaway: evaluation is not about how much you can automate, instead on how much user productivity increases.

AnHai Doan giving the keynote at the Ontology Matching workshop #iswc2020 “This it the most important slide”: the R&D template for his work.
Goal is to NOT automate Entity Matching.
Goal is to improve USER productivity.

LOVE THIS! pic.twitter.com/xQ0u6HuOLs
— Juan Sequeda (@juansequeda) November 2, 2020

This was music to my ears! In previous conversations I’ve had with AnHai, I was happily surprised to know that we were both tackling problems in similar ways: let’s break down the problem into several steps, let’s figure out how to solve it manually, that becomes the baseline and from there we can improve. AnHai has been focusing on entity linking, while I have been focusing on schema/ontology matching. Many lessons learned from AnHai’s experience (his startup was acquired by Informatica earlier this year):

Brilliant keynote by AnHai Doan on Entity Matching at Ontology Matching Workshop. His call to arms: bridge the gap between real world problems and academic research by focusing on the "boring trivial problems" for the "horses" (masses, not the Googles of the world). #iswc2020 pic.twitter.com/wfoDKZS2Q4
— Juan Sequeda (@juansequeda) November 2, 2020

Users came up on the topic of ontology/schema matching:

I had a “hallway” chat with Catia Pesquita and she mentioned the need of a “usefulness metric” which I think is spot on. In the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching. I mentioned the following in our slack discussion:

Food for thought: how about some sort of User Metric that measures the productivity of a user. For example, if a System A has lower precision than System B but it takes a lot of effort to setup/maintain/… System B, then maybe I would prefer System A. This is just an example. I’m brining this up because I’m seeing a trend throughout the conference about the realization to understand more how Users are involved.

The community is in agreement that we need to expand the SemTab challenge to take in account users. I’m excited to be part of the organizing committee of this challenge for next year.

Cassia Trojahn presented a paper “Generating Expressive Correspondences: An Approach Based on User Knowledge Needs and A-Box Relation Discovery” which tackles two things that I’m interested in: complex matches (the real world is not about simple 1-1 schema matches) and users’ needs. Unfortunately, there was no user evaluation. This is the next step that we need to take as a community. Evaluate the cost of having users and how we can decrease that cost.

Larry Hunter‘s keynote was on using semantic technologies in life science. I was fascinated by the honest discussion on how to integrate the data: paying experts to manually verify. Larry clearly acknowledged that we need genuine experts to validate the mappings. In his case, the experts are doctors and they are expensive. Therefore, he needed a budget in order to hire people to do the expert mappings. And this requires time. We lack methodologies to do this in a systematic way. While listening to his presentation, all I could think about was the need to focus on user productivity.

Users came up from a developer standpoint: One thing that I cringe is when I read or hear people making claims about users or developers without any evidence supporting these claims. The work of LD-Flex and OBA is tackling how to make RDF and Linked Data more accessible to developers. During the discussion of these papers, Ruben Taelman and Daniel Garijo made statements such as “developers like X, they prefer Y, etc”. These are anecdotes. Funny thing is that the anecdotes of the LD-Flex and OBA authors contradicted each other. My takeaway: it’s still not clear what developers would prefer.

Miriam Fernandez gave a splendid vision talk. She questioned the following: “the web we have created… the view is of the creators… is this really a shared conceptualization? How much of the knowledge we have created that is on the web contains alternative facts?” and asked everyone:

"Where are the semantic web clinical trials? Vision: Socially obligated semantic web. Need a multi-disciplinary and diverse team. Consider human-centric metrics for evaluation" @miriam_fs in the vision track at #iswc2020

YES!!! Bravo!
— Juan Sequeda (@juansequeda) November 4, 2020

My slides for the vision track of @iswc2020 #ISWC2020 are available here: https://t.co/1i8ms2LHjG Thanks everyone for all the fantastic questions and feedback!
— Miriam Fernandez (@miriam_fs) November 4, 2020

In Peter Fox‘s vision talk, he asked “Are we educating enough of the new generation? ” IMO, we are not (see next section). He also reminded us about humans in the loop, which is a topic that is gaining a lot of traction in so many other fields of Computer Science.

My overall takeaway here is that by looking at traditional problems and their incremental solutions from a social-technical perspective, we can make the science much more interesting. We need to define evaluation metrics for users. As I mentioned in a slack discussion:

As a community, we need to push ourselves into the uncomfortable position of doing user evaluations. Is this hard? OF COURSE!! but heck, that’s what makes it interesting. Life can’t be that easy

Knowledge Science (a.k.a Knowledge Engineering 2.0)

Another interest I have is understand how to bridge the gap between the data producers and data consumers. I’ve argued for the need of Knowledge Scientist and Data Product Managers (listen to our podcast episode on this topic) to fill that gap.

Oh was I happy to see this topic as Elena Simperl‘s vision. She asked: “What do we know about the technical, scientific and social aspects involved in the building, maintaining and using knowledge based systems? ” This community has a lot to say because many come from the Knowledge Acquisition community. A lot of open and hard questions are still open: How do we capture common senses knowledge (there was a tutorial on that), culture and diversity? There are many HARD questions that we need to ask ourselves about modeling knowledge that are outside of our comfort box (how do we model negative facts or ignorance?). From a tooling perspective, where is the equivalent of jupyter notebooks for knowledge modeling (Gra.fo is a step in that direction that combines knowledge modeling with collaboration) ? Elena stated that the next wave of AI will not succeed unless we study these hard questions, and I fully agree. If we do not understand the knowledge about our data, it is going to continue to be garbage in, garbage out. Similarly, Peter Fox asked: “Are we educating enough of the new generation?” IMO, we are not! My takeaway: we need to teach knowledge engineering to the next generation of students and we need to research (again) knowledge engineering taking in account the data discipline of the 20th century (knowledge engineering was popular in the 1990s), and interdisciplinary methods. Additionally, we need to work with other communities.

Together with Cogan Shimizu, Rafael Gonçalves, and Ruben Verborgh, we organized the PRAXIS workshop: The Semantic Web in Practice: Tools and Pedagogy. Our goal was to have a WORKshop and gather a community focused on the collection, development, and curation of pedagogical best practices, and the tools that support them, for the Semantic Web and Knowledge Graph communities.

Wrapped up "The Semantic Web in Practice: Tools and Pedagogy" Workshop at #iswc2020. It was an awesome WORKshop. ~20 people, split int developers & pedagogy group Each group met for 1 hour & then regrouped Main discussion centered on bachelors knowledge engineering/science course
— Juan Sequeda (@juansequeda) November 1, 2020

We had a successful event discussing the need of creating a syllabus for a bachelors course on Knowledge Engineering/Science. We acknowledge that courses in a masters program is already too late. We are going to start cataloging existing semantic web, knowledge graph, knowledge engineering courses. Stay tuned because we will need your help!

Overall, I believe we are realizing that we need to reinvent the role of Knowledge Engineering for the 2020s: Knowledge Science (a.k.a Knowledge Engineering 2.0)

Creative RDF Graph Data Management

Around a decade ago, there was a lot of RDF data management work at ISWC that resembled work that could have been published at a database conference. Largely, that type of work has gone away. This year I was happily surprised to see this topic come back with novel and creative approaches.

Tentris is a tensor-based triple store where an RDF graph is mapped to a tensor and SPARQL queries are mapped to Einstein summations and leverages worst case optimal multi join algorithms. Juan Reutter gave the keynote AGM bound and worst-case optimal joins + Semantic Web a 4th Workshop on Storing, Querying, and Benchmarking the Web of Data. AGM bound is “one of the most important results in DB theory this century” which has led to the rise of “worst-case optimal” join algorithms. This is a very popular topic in the database community and the semantic web community should look into. Trident is a new graph storage engine for very large Knowledge Graphs with the goal of improving scalability while maintaining generality , support for reasoning, and runs on cheap hardware.

I was very excited to see the work on HDTCat, a tool to merge the contents of two HDT files with low memory footprint. RDF HDT is a compressed format for RDF (basically the parquet files for RDF). This is a problem we encountered at data.world a while back. Every dataset ingested in data.world is a named graph represented in RDF-HDT so when running a SPARQL query over multiple named graphs, we encountered this issue when the data was large: it just used too much memory. It was very nice to see that the solution presented in HDTCat is similar to what we did at data.world to solve this problem.

I enjoy seeing how the community is looking at how to extend SPARQL in many different ways:

Turing complete to support graph analytics (i.e. page rank)

Extending SPARQL to be turing complete and being able to express graph analytics (pagerank, etc). Definitely something that RDF graph databases should be adopting @OntotextGraphDB @StardogHQ @Franzinc https://t.co/OhgOq7qlkk
— Juan Sequeda (@juansequeda) November 3, 2020

With similarity joins

I will be presenting in next session our paper "Extending SPARQL with Similarity Joins" at #ISWC2020 if you can't join us, find the slides and the video presentation here: https://t.co/tet6h3gUeJ

See you there!!
— Seba Ferrada (@ferradest) November 3, 2020

combining Graph data with Raster and Vector data (The GeoSPARQL+ paper was a best student paper winner) and studying SPARQL query logs and user sessions to understanding user behavior.

Data, data, data

The vision talks speakers had some fantastic insights about data.

Barend Monds reemphasized the need for FAIR data and how we should keep metadata and data separate. For many applications, you need to consume only metadata initially. Barend made a bold and strong statement: Invest 5% of research funds in ensuring data are reusable. Jeni Tennison, gave a heartfelt message: don’t use data for negative aspects of life, data and access to data is political, access to data should be the norm and we need a world where data works for everyone. Fabien Gandon reminded us that the web connects ALL things. Stefan Decker made an important call to take persistent identifiers seriously:

Takeaway message from @stefanjdecker: Take Persistent Identifiers seriously! YES!!! this seems like a small "boring" thing, but it is crucial to interoperate data and knowledge. Love this!! #iswc2020
— Juan Sequeda (@juansequeda) November 4, 2020

@stefanjdecker's final statement in the #ISWC2020 Vision Talks: "Before we go into deep semantics, we should clarify what identity means. Let's create infrastructures that prevent link rot."
— Christoph Lange (@clange) November 4, 2020

My slides from the #iswc2020 vision session @iswc_conf are available at the link. Let’s hope the link is persistent. https://t.co/GZbXBJjjhf

Let
— Stefan Decker (@stefanjdecker) November 4, 2020

This seems like a small issue but it is CRUCIAL. If we were to think about persistent identifiers correctly from the beginning, I postulate the many of the data integration problems we suffer would go away.

Oh, and I believe Peter Fox coined the term semantilicious. Is your data semantilicious?

Of Course…

Of course the expected hot topics were present (take a look at the list of accepted papers).

Of course …there was a lot of work presented about embeddings!

Of course … the combination of neuro-symobolic approaches was a hot topic. This was in Uli Sattler‘s vision. Take a look at the Common Sense Knowledge Graph tutorial.

Of course… Explainable AI was a topic. In particular I appreciated Helena Deus‘ vision of incorporating the bias in the model such that the model could be avoided if it’s not applicable (if the model is trained on lung images, don’t use it on brain).

More Notes

We had a lot of great social events: Ask Me Anything with Craig Knoblock, Jim Hendler, Natasha Noy, Elena Simperl, Mayank Kejriwal. We also had meetups: Women in Semantic Web and Semantic Web Research in Asia. The Remo platform worked very well for “hallway” conversations.

The amount of effort that @iswc_conf has put into recreating social events and human interactions is truly appreciable! Not all communities and virtual conferences could manage to cover this important aspect to such an extent during the pandemic! well done #iswc2020 #iswc_conf
— Sam Jozashoori (@samstwitting) November 3, 2020

The vision talks and sister conference presentations are awesome. Please keep that!

In an AMA conversation with Jim, he shared Tim Berners-Lee pitch for the semantic web: my document can point to your document, but my spreadsheet can’t point to yours. In other words, my data can’t point to your data

Need to take a look at “G2GML: Graph to Graph Mapping Language for Bridging RDF and Property Graphs” http://g2gml.fun/

Need to take a look at “FunMap: Efficient Execution of Functional Mappings for Scaled-Up Knowledge Graph Creation” https://github.com/SDM-TIB/FunMap

Need to take a look at “Tab2Know: Building a Knowledge Base from Scientific Tables“

From what I heard, the Wikidata workshop was a huge hit.

My friends at UPM gave a Knowledge Graph Construction tutorial. I believe this topic has a lot of interesting scientific challenges when users come into play. A lot of opportunities here!

Chris Mungall gave an interesting keynote at Ontology Design Patterns workshops on how to use design patterns to align ontologies in the life science. What I appreciated about his talk is the practicality of his work. He is taking the theory into practice.

How do you represent negative facts in a knowledge graph? See “Enriching Knowledge Bases with Interesting Negative Statements“. Larry Hunter brought up something related in his keynote: how do you represent ignorance?

Hawking WasNOT awarded NobelPrize; Istanbul IsNOTCapitalOf Turkey – Hiba Arnaout, Simon Razniewski and Gerhard Weikum talk about salient negative facts. Cool idea! #iswc2020
— Steffen Staab (@ststaab) November 3, 2020

How can we make RDF/Linked Data/Knowledge Graphs friendl for developers. See SPARQL Endpoints and Web API Tutorial, OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs:

Thanks everyone today for your questions about OBA @iswc2020 If you could not make it, the slides are available here: https://t.co/ydrlfmrX0J more details in GitHub: https://t.co/ij7NXZ9b3t
— Daniel Garijo (@dgarijov) November 3, 2020

And LDFlex:

If we want to see engaging #LinkedData apps for end users, we need to support front-end developers with adequate abstractions. Realizing this is a research challenge—not solely an engineering matter. Watch my #ISWC2020 talk to learn why: https://t.co/IxnEBTqkzW #decentralization
— Ruben Verborgh (@RubenVerborgh) October 19, 2020

I realized that I didn’t spend time at industry/in-use talks (Last year I spent most of my time at in-use/industry talks). Need to review those papers.

Something new to add to my history of knowledge graph work:

The Vision of the #SemanticWeb already phrased by Internet pioneer J.C.R.Licklider in 1965 in "Libraries of the Future" mentioned by @stefanjdecker at #iswc2020
…even before the internet was switched on pic.twitter.com/G3VsftYa1m
— Harald Sack (@lysander07) November 4, 2020

Kavitha Srinivas gave a cool keynote on knowledge graph for code

Kavitha Srinivas, IBM, gives exciting #iswc2020 keynote on knowledge graphs for code https://t.co/QPHp169LBo. Mentions 4 shortcomings. 2 having to do with provenance/triple annotations. Take note @reallynotabba @bestmleinberger 😉
— Steffen Staab (@ststaab) November 3, 2020

RDF* always seems to come up

Interested in RDF*/SPARQL*, the defacto community standard to represent property graphs in RDF graphs? See https://t.co/p7FxPiQwQl https://t.co/EGWyc8wNAT
— Juan Sequeda (@juansequeda) November 6, 2020

Oh, and in case you need some ideas:

What can @iswc_conf do?@miriam_fs care about social problems@pacoid help industry, do consulting, large disconnect
B. Mons move to fluid knowledge graphs@stefanjdecker What does identity mean, infrastructure to deal w/ link rot
U. Sattler: challenge proposals/tasks #iswc2020
— Juan Sequeda (@juansequeda) November 4, 2020

My Daily takeaways

First day of #iswc2020 is a wrap. So much interesting stuff already. Takeaways:
– SPARQL extensions: turing complete SPARQAL for graph analytics & iSPARQL for join similarities
– How to make KG dev friendly through APIs? LDflex and OBA
and great social interactions on Remo!
— Juan Sequeda (@juansequeda) November 3, 2020

Day2 #iswc2020 My takeaways
– trend I’m seeing is realization to have human centric metrics in talks by @miriam_fs, keynote by @ProfLHunter . @CPesquita mentioned need of “usefulness metric”
– Tentris tensor based triplestore is very intriguing
– this community is so eclectic 🙂
— Juan Sequeda (@juansequeda) November 5, 2020

Day 3 #iswc2020 takeaways
– a lot neuro symbolic learning, knowledge graph embeddings (not surprising)
– Interesting & clever ways to scale RDF databases: HDTCat, Trident
– Tabular data to KG matching is popular but incremental Great discussion on how to make it more challenging
— Juan Sequeda (@juansequeda) November 6, 2020

Last day #iswc2020 takeaway:
– Bold visions by @taswegian @esimperl @hdeus @fabien_gandon @JeniT
– Users continue to show up!
– Are we educating enough next generation?
– Access to data should be the norm
– Should be obsessed to capture bias in model
– Web connects everything
— Juan Sequeda (@juansequeda) November 6, 2020

Congrats to all the winners

I am deeply pleased and honored to receive this very prestigious award!!!@YTzitzikas @iswc_conf #iswc2020 @UOC_CSD_News @ICS_FORTH pic.twitter.com/WADRSn9veI
— Michalis Mountantonakis (@mountantonakis) November 3, 2020

Congrats to @elusivemowgli, Kunal, @pascalhitzler, and @amit_p for the 10 year SWSA award 😀 @iswc_conf #iswc2020 pic.twitter.com/7mA8PpUPnJ
— Raghava Mutharaju (@mraghava) November 3, 2020

Honored to receive the Semantic Web Science Assoc
@semwebsa10 yr award on behalf of my coauthors Halpin, Hayes @jpmccu & thompson. This recognizes the highest impact papers from the
proceedings 10 years prior. Presented this morning at #iswc2020 @csatrpi @RPIScience #twcrpi pic.twitter.com/fPwWo3KJCX
— Deborah McGuinness (@dlmcguinness) November 3, 2020

The #iswc2020 best paper award goes to Ronald Denaux and Jose Manuel Gomez-Perez for their paper "Linked Credibility Reviews for Explainable Misinformation Detection" Congratulations to the authors as well as the authors of the runner-up candidate papers! pic.twitter.com/pEUaxnXAN3
— International Semantic Web Conference (@iswc_conf) November 6, 2020

The #iswc2020 best student paper award goes Timo Homburg, Steffen Staab and Daniel Janke for their paper "GeoSPARQL+: Syntax, Semantics and System for Integrated Querying of Graph, Raster and Vector Data". Congratulations to the authors and to all authors of candidate papers! pic.twitter.com/H31WDqxO2U
— International Semantic Web Conference (@iswc_conf) November 6, 2020

The #iswc2020 best resource paper award goes to Shruthi Chari, Oshani Seneviratne, Daniel Gruen, Morgan Foreman, Amar Das and Deborah McGuiness for their paper "Explanation Ontology: A Model of Explanations for User-Centered AI". Congratulations! pic.twitter.com/OvnoH7zCVZ
— International Semantic Web Conference (@iswc_conf) November 6, 2020

The #iswc2020 best in-use paper award goes to (i) Umutcan Simsek, Kevin Angele, Elias Kärle, Oleksandra Panasiuk and Dieter Fensel and to (ii)Omar Benjelloun, and Shiyu Chen and Natasha Noy. Congratulations to the authors and to all authors of candidate papers! pic.twitter.com/Zdnsivmk2E
— International Semantic Web Conference (@iswc_conf) November 6, 2020

The #iswc2020 best poster or demo award goes to Domenico Lembo, Yunyao Li, Lucian Popa, Kun Qian, and Frederico Scafoglieri for their paper "Ontology Mediated Information Extraction with MASTRO SYSTEM-T". Congratulations to the authors and all authors of candidate posters/demos! pic.twitter.com/QhDL3WIXi6
— International Semantic Web Conference (@iswc_conf) November 6, 2020

My main takeaway: this is an eclectic community!

The semantic web community is truly an eclectic community. In this conference you can see work and talk to people about Artificial Intelligence, Knowledge Representation and Reasoning, Ontology Engineering, Machine Learning, Explainable AI, Database, Data Integration, Graphs, Data Mining, Data Visualization, Human Computer/Data Interaction, Streaming Data, Open Data, Programming Languages, Question Answering, NLP and of course, the Web! Therefore if you feel that you don’t fully fit in your research community because you are dabble in other areas, the semantic web community may be the place for you!

This is also a diverse community

Yes!! One of the cool things about the semantic web community is the gender balance! https://t.co/Ch5ed6sAoU
— Juan Sequeda (@juansequeda) November 3, 2020

I’m very proud to be part of the community and to consider it home! I miss hanging out with all my friends, having inspiring conversations, dancing, eating, and making new friends.

Massive THANK YOU to the entire organizing committee to make this an amazing virtual event, specially the general chair Lalana Kagal!

Hopefully “see” you next year in Albany, NY!

… and here’s the rest of the #ISWC2021 Organising Committee. Hope to see you all next year. @iswc_conf #ISWC2020 pic.twitter.com/Ho0zUBDneu
— Harith Alani (@halani) November 6, 2020

Month: November 2020

International Semantic Web Conference (ISWC) 2020 Trip Report