2019 Knowledge Graph Conference Trip Report

I just got back from the 2019 Knowledge Graph Conference organized by the School of Professional Studies of Columbia University and chaired by François Scharffe. I was honored to be invited by François to be a member of the Program Committee. Our task was to invite speakers, go over the submitted proposals and help shape the program of this event.

As always, I asked myself: what does success look like? If we learn what are the real world problems that various industries are tackling with Knowledge Graphs, and how they are achieving it. Additionally, for the skeptics to leave less skeptical and eager to engage more with Knowledge Graphs.

I can report that, per my definition, this was a successful event! It actually surpassed my expectations. The event was packed and all 200 tickets were sold out.

My main takeaways: 1) Finance is all over Knowledge Graphs, 2) more and more industries are now starting to pay attention, 3) roadblocks are social, not technical (the technology works!) and 4) virtual knowledge graphs are gaining a lot of interest (keep the data where it is).

Kicking off the @KnowledgeGraph2 Conference. Full house! Great job @lechatpito for organizing this.

Let’s use this hashtag #kgc2019 pic.twitter.com/PbxjzeREno
— Juan Sequeda (@juansequeda) May 7, 2019

Before I dive into details of this trip report, I believe it is paramount to highlight a problem that was observed by many: the lack of gender diversity. This was also observed in the past W3C Graph Data Workshop. We have a vibrant graph community, but why are we lacking gender diversity in this graph community? While we were creating the program, we invited many female speakers. Unfortunately, some couldn’t make it and some cancelled at the last minute. It was great to see a larger female representation within the audience. A diverse group brings diverse ideas and fosters increased creativity. As a community, we need to make sure that all voices are included. This lack of diversity worries me tremendously. We need to support the community at large and encourage people from all diverse backgrounds to participate and speak at next year’s Knowledge Graph Conference (yes, this event will take place next year!).

This event did have a diversity of industries attending. The talks and break discussions were very broad. I’m going to organize this report by the following topics: Finance, Other Industries, Unicorns, Virtual Graphs, Vision and Vendors.

Finance

Given that we were in New York, there was a great representation of financial services companies.

Lots of top banks at #kgc2019: Goldman Sachs, JPMC, BoAML, Morgan Stanley, Wells Fargo, BNY Mellon Also Capital One and Master Card. Financial people are finally adopting #KnowledgeGraphs. Few of those already clients of @ontotext #sparql
— Atanas Kiryakov (@kiryakov_ak) May 7, 2019

The day was kicked off with a talk by Christos Boutsidis from Goldman Sachs.

Goldman Sachs is creating a social graph which integrates email (Who emailed whom), telecommunication (Who called whom), trading (Who traded what) & financial (Who sent money to whom) data. Use case is compliance. 100M edges. 2B nodes. 10TB. In house Hadoop cluster. #kgc2019
— Juan Sequeda (@juansequeda) May 7, 2019

It takes 1 week for them to construct the graph (if they are lucky) and 1 day to update the graphs with deltas. They infer and extract knowledge from the graph by running a series of standard graph algorithms: Edge weights to understand how strong is a relation between a client and employee, Vertex centrality (i.e. Pagerank) to identify influencers, Vertex similarity to match Marcus applicants with politically exposed people, All pairs all paths to find Shortest paths to connect people with the firm and Community detection (clustering) to find set of accounts that participate in the same money transfer. The compliance applications are Insider threat, Insider trading, AML, Marcus Lending/Banking and Co-branded credit card.

The next talk was by Patricia Branum and Bethany Sehon from Capital One. Their goal was to attach an ontology to their existing Customer 360 data in order to enhance definitions, standardized metadata and then further improve the metadata.

Successful pilot by Capital One in the risk/data management standards team. Defined Customer Ontology & connected w/ Cust 360 Data. 200% data quality & 20% search results improvement. Promote federation/virtual graph New Requirement: need to auto generate R2RML mappings #kgc2019 pic.twitter.com/Boh0TJ6Qfd
— Juan Sequeda (@juansequeda) May 7, 2019

When asked, how did they get sponsorship and internal buy in, it was an easy sell within Capital One because they see themselves as a data-driven company (shouldn’t everybody be one?!!). Given that their sponsors were in risk management, which deals with a lot of data, it was easy to fund the pilot. Capital One is planning to take this into production. They are also looking into reasoning.

What were their challenges? It wasn’t technical, it was social (I topic that I discussed during my talk)

"What are major challenges for standardizing your vocabularies?"

"There is a lot of ego…" – Bethany Sehon, Capital One#kgc2019
— Denny Vrandečić (@vrandezo) May 7, 2019

I also really liked their definition of an ontology (and see the replies to my tweet to see other interesting discussions)

From Capital One

“What is an Ontology?

Glossary of terms. With rich connections.”

Nice crisp and succinct definition #kgc2019
— Juan Sequeda (@juansequeda) May 7, 2019

David Newman from Wells Fargo, a long timer also presented.

David Newman from Wells Fargo, unfortunately can’t talk about what they are doing internally. Explaining the values of knowledge graphs, preaching to the choir. Big push of ML on Knowledge Graphs. Not using virtual graphs… yet! #kgc2019 pic.twitter.com/GmuREJBCvh
— Juan Sequeda (@juansequeda) May 7, 2019

Tim Baker from Refinitiv (formerly Thomson Reuters Financial & Risk) presented their Knowledge Graph used to track bad actors.

Tim Baker @tim63baker from Refinitiv. presenting their Knowledge Graph which has over 200B triples. All nodes and relationships are represented by a PermID. V2 will (apparently) be available for free. Discussing use case on tracking bad actors #kgc2019 pic.twitter.com/NHzxA0oqOq
— Juan Sequeda (@juansequeda) May 7, 2019

Vivek Khetan from Accenture discussed on combining knowledge graph and NLP to understand regulatory press releases

#kgc2019 @vivekkhetan_ism from Accenture. Goal is to enable financial services company to monitor and respond to regulatory changes by analyzing press releases pic.twitter.com/LoNeZEF4ke
— Juan Sequeda (@juansequeda) May 8, 2019

It’s been known for a while that the financial industry has been using semantic/graph technology for a while. But why has it been taking so long? I think Dean Allemang‘s First mover slide below sums it up:

#kgc2019 Great stories of @WorkingOntology’s experience. Great last slide. pic.twitter.com/XkyAH3WbrW
— Juan Sequeda (@juansequeda) May 8, 2019

More Industry Real World Use cases

Joe Pindell from Pitney Bowes and Colin Puri from Accenture jointly presented a customer service use case. With their knowledge graph they are 1) providing context and guidance, 2) discovering resolutions via relationships and 3) modeling & merging data views.

#kgc2019 PitneyBowes Use case Intelligent Complaint Handling Combine KG, NLP & Customer360 Today static decision tree, no feedback, long calls, incomplete resolutions.
Goal: connect w/ right CSR, understand cust network, discover related complaints Using Graphs since 2012 pic.twitter.com/zVXzggVeTY
— Juan Sequeda (@juansequeda) May 8, 2019

Lambert Hogenhout from the United Nations shared with the audience the reasons why the UN needs knowledge graphs.

#kgc2019 Lambert Hogenhout from UN. Current insights after data analysis gets stored in a spreadsheet and then gets lost. They want to great a graph of insights, link them and generate new insights. Just starting! Super interesting!! pic.twitter.com/33i3WNzR58
— Juan Sequeda (@juansequeda) May 8, 2019

The UN also needs to deal with many multilingual issues. They are just starting out.

Chris Brockmann from Eccenca discussed how Knowledge Graphs are used to integrate data in supply chain and provided a great ROI.

Chris Brockmann from Eccenca. Asked who is from supply chain, nobody raised their hand. Most people are from the Financial Services (we are in NY!). Going through a KG supply chain use case showing great ROI. Another term for Knowledge Graph = Business Digital Twin. #kgc2019 pic.twitter.com/zvrh3ulqdu
— Juan Sequeda (@juansequeda) May 7, 2019

Tom Plasterer from AstraZeneca discussed that their main challenges is that data is all over the place. Their approach is to build a knowledge graph following the FAIR principles.

Great @PistoiaAlliance webinars on FAIR Data, and GO-FAIR from @GOFAIRofficial (https://t.co/X5XSFLKTGj) as well as a panel on FAIR data catalogs (https://t.co/drNJ3hzSjQ) for those wanting go go further…
— Tom Plasterer (@TPlasterer) May 8, 2019

Parsa Mirhaji from Montefiore Hospital discussed on how it is still challenging to do analytics with health data

Parsa Mirhajifrom Montefiore Hospital #kgc2019 pic.twitter.com/CRpFp6JG2K
— Juan Sequeda (@juansequeda) May 8, 2019

Steven Gustafson from MAANA shared their experience of creating knowledge graphs in the oil and gas industry. The popular term in this industry is Digital Transformation and he provided an interesting definition: Knowledge Graph + Function Graph = Digital Transformation, where a function graph is a graph of methods (ie functions) and how they interact between each other.

#kgc2019 @stevengustafson from Maana. Nobody wants to pay for search. Customers want answers to specific questions.
Missing is a representation of analytics in a knowledge graph. pic.twitter.com/VxezfwGPTX
— Juan Sequeda (@juansequeda) May 8, 2019

Unicorns

By unicorns I mean, companies that are very different from the mainstream (not everybody is a Google). It was very excited to have representatives from Airbnb, Amazon, Diffbot, Uber and Wikidata.

Xiaoya Wei from Airbnb presented their knowledge graph:

#kgc2019 Xiaoya Wei @Airbnb KG provides accurate, comprehensive, & updated knowledge about supply and travel destination. Built from scratch.
42M nodes. 705M edges. 30 Node Types. Nodes/Edges from same source are on the same machine. Usage: navigation, recommendation, context
— Juan Sequeda (@juansequeda) May 7, 2019

They built the infrastructure from scratch. From a storage and data partitioning perspective, the nodes and edges are stored separately, by source. Node schema and edge payload are defined by thrift binary. It is horizontally scalable. The goal is to avoid broadcast for queries with large fan out. From a query perspective, the objective is to traverse a subgraph and retrieve nodes and edges from the traversal. Data is being ingested via an asynchronous framework to continuously import data. Diffs are calculated and then published on Kafka. Finally, why did they build the infrastructure from scratch? Because they built upon the infrastructure that they currently support internally (i.e. they don’t want to bring in more software and have to support it).

Three use cases were discussed: 1) Navigation via a taxonomy that describes the inventory, 2) recommendation and 3) provide more context.

Data quality and consistency is a key challenge. A human team checks data quality. That is why access control is important for them because a user can only make changes to the data that they know.

Subhabrata Mukherjee from Amazon (now at Microsoft Research) discussed how the Amazon Product Graph is being built.

S. Mukherjee explaining Amazon Product Graph KG != Product Graph Developed OpenTag to extract knowledge from Product catalog HITL to clean up noisy data Today 1 vertical & few sources Goal: 1Ks to 1Ms of sources, large type hierarchy. Challenge lack of training data. #kgc2019 pic.twitter.com/jrLORAh07M
— Juan Sequeda (@juansequeda) May 7, 2019

Human in the loop techniques are required to clean up noisy training labels. Additionally, the information extraction system return a triples of strings, therefore the strings needs to map to concepts (things not strings!) in order to truly integrate the knowledge.

Even though Diffbot is a startup, I’m putting them in the unicorn category because they are doing something very unique that not everybody needs to do: create a knowledge graph by crawling the web. Effectively, they are competing against Google and offering services that Google doesn’t. Mike Tung, CEO of Diffbot presented:

#kgc2019 @mikektung @diffbot: crawls the web to generate a KG. Approach: Visual layout Classification of webpages into ~20 high level topics Apply NLP & CV on the images then knowledge fusion. 10B entities 1T facts. Adding 150M entities/month KG available for research purposes pic.twitter.com/El9XKC1LXA
— Juan Sequeda (@juansequeda) May 7, 2019

Great quote:

“Knowledge is what makes AI intelligent” – Mike Tung at #kgc2019
— Roberto Shimizu (@robertoshimizu) May 7, 2019

How much a fact costs?

⁦@diffbot⁩ #kgc2019 #KnowledgeGraphs #NLProc pic.twitter.com/hjcQi0Y39l
— Vivek Khetan (@vivekkhetan_ism) May 7, 2019

@diffbot knowledge graph: 10 Billions entities, 1T facts. Example of applications: text analysis and automatic entity annotation. #KGC2019 pic.twitter.com/uygp19muzj
— Freddy Lecue (@freddylecue) May 7, 2019

Josh Shinavier described lessons learned from creating a Knowledge Graph at Uber. Josh also confirmed Airbnb’s comments about why they built their own infrastructure from scratch: they want to reuse the support capabilities that they already have and not bring new software into the mix.

#kgc2019 Fit the tooling to the infrastructure. That’s why they build instead of buy.
OLTP graph based on Cassandra
OLAP graph based on hdfs using CAPS
Graph Embeddings.
Want to go Virtual graphs (NoETL) pic.twitter.com/kuGFSh5KlO
— Juan Sequeda (@juansequeda) May 8, 2019

For more details, check out the article Uber’s graph expert bears the scars of billions of trips.

Finally, Denny Vrandecic from Google AI talked about Wikidata. Check out Vivek Khetan’s twitter thread on Denny’s talk.

#kgc2019 @vrandezo sharing @wikidata: 56M items, 700M statements, 400 lang, 20k active contrib p/month, 900M edits, 8.5M daily SPARQL queries, Healthy community that helps write sparql queries.

How to express redundant NL from wikipedia as triples? Proposal: frame structure pic.twitter.com/cIRD7n7zfi
— Juan Sequeda (@juansequeda) May 7, 2019

Virtual Graphs

Capital One, AstraZenca, Uber and Wells Fargo all publicly stated that they are looking into virtual graphs. This means, they want to be able to keep the data in its original source and have a way to virtualize it as a Knowledge Graph.

This is music to my ears because this is what my PhD was all about and on the premise for which Capsenta was founded: a NoETL (i.e. virtualize) approach to data integration via semantic/graph technology.

I had a lot of discussions during the breaks with other folks about this topic. There is an agreement that moving the data to a centralized location has been the status quo and it’s getting more and more expensive. I’m also glad to see other vendors talking about virtualization such as data.world and Stardog.

Machine Learning

Subhabrata Mukherjee’s talk provided a lot of details into their machine learning process. Take a look at Vivek’s twitter thread.

Alfio Gliozzo from IBM Research discussed how to extend Knowledge Graphs using Distantly Supervised Deep Nets. The challenge: develop hand labeled data. There is an agreement with the ML folks in the audience. Vivek also has a detailed thread on this talk.

Freddy Lecue from Thales discussed explainable AI.

Explainable AI with Knowledge Graphs by @freddylecue #kgc2019 pic.twitter.com/ronLUE7Qvj
— Juan Sequeda (@juansequeda) May 7, 2019

Vision

Given the hype of Machine Learning, Deep Learning, AI, etc, I’ve been asking myself if we will ever automate the creation of Knowledge Graphs. I had a great discussion with Subhabrata Mukherjee on this topic. He thinks that we will get there assuming the source of data is unstructured because there is so much overlapping data within the same domain. On the other hand, when the source is structured data, we both agreed that the future doesn’t look bright. There simply isn’t enough overlapping domain data. As I mentioned in my talk, I never thought that I would be working on methodologies because we need to empower humans and machines to work together.

We were very lucky to have Pierre Haren, a pioneer in AI and rule systems and founder of ILOG. He spoke about the future evolution of knowledge graphs to casual graphs where the relationships (edges) are causal.

Pierre Haren, Starting off with history (love that!) and evolution from lisp all the way to what he calls Causal Graphs. #kgc2019 pic.twitter.com/dyqvzlJ2rs
— Juan Sequeda (@juansequeda) May 7, 2019

Personally, I was thrilled to finally meet him and get his input for our upcoming tutorial on the History of Knowledge Graph’s Main Ideas at ISWC2019.

Vicky Froyen discussed where Collibra is heading.

#kgc2019 Vicky Froyen from Collibra. Introducing the notion of a Context Graph. My understand is that it’s a high level enterprise ontology, just metadata, which helps locate/find data. pic.twitter.com/Rjh2EEDCcr
— Juan Sequeda (@juansequeda) May 8, 2019

Vendors

We had representatives from many vendors: AllegroGraph/Franz, Amazon Neptune, Data Chemists, Datastax, data.world, GraphDB/Ontotext, Neo4j, Stardog, TigerGraph and yours truly, Capsenta!

I gave a 20 min version of my talk Designing and Building Enterprise Knowledge Graphs from Relational Databases in the Real World (which is an evolution of my previous talk on Integrating Semantic Web in the Real World: A Journey between Two Cities ). I’m happy to share that the talk was very well received. Check out this twitter thread.

I was also very thrilled to give demos of Gra.fo, our visual collaborative and real-time knowledge graph schema editor. I love seeing the faces of people when they see Gra.fo for the first time. I am so proud of the entire Capsenta team for developing Gra.fo!

@juansequeda on designing knowledge graphs from relational DB: doing it in real world is challenging. Data and Schema are the weakest links. Great reference on https://t.co/3NVmD5mBpo on history of knowledge graphs btw. #KGC2019
— Freddy Lecue (@freddylecue) May 7, 2019

@juansequeda explaining how KG and ontologies should cross the chasm at #kgc2019. the roles of kn.engineers vs data scientist; how not to boil the ocean in data warehouse-like KGs pic.twitter.com/8LQ7xLa02X
— Atanas Kiryakov (@kiryakov_ak) May 7, 2019

A very engaging @juansequeda on stage at the Knowledge Graph Conference #kgc2019 pic.twitter.com/HAzi8dI5UA
— Christophe Willemsen (@ikwattro) May 7, 2019

.@juansequeda: methodology is as important as semantic tech when it comes to building enterprise knowledge graphs. #KGC2019 pic.twitter.com/HVT2ueyczq
— Joshua Shinavier (@joshsh) May 7, 2019

@juansequeda observations on bridging the chasm and the concepualization gap for #knowledgegraphs. #KGC2019 @Capsenta @UTAustin pic.twitter.com/2uRvIHvy0n
— Thomas Deely (@DeelyThomas) May 7, 2019

Preaching truth – mappings are hard! @juansequeda #kgc2019 pic.twitter.com/aBJP6r5lk8
— DataChemist (@data_chemist) May 7, 2019

Nav Mathur from Neo4j discussed how they build knowledge graphs

The key principles of a Knowledge Graph per @neo4j #kgc2019 pic.twitter.com/mqBUTe29Jv
— Juan Sequeda (@juansequeda) May 7, 2019

Jesús Barrasa shared an objective comparison between RDF Graphs and Property Graphs (more later)

#kgc2019 @BarrasaDV from @neo4j giving a great run down on factual differences between RDF Graphs and Property Graphs. pic.twitter.com/4XCR3m7szi
— Juan Sequeda (@juansequeda) May 8, 2019

Brad Bebee shared lessons learned from Amazon Neptune’s customers

#kgc2019 Brad Beebe from Amazon Neptune. Developers drive to Property Graph. Information Architects drive to RDF. Customers build knowledge graphs with both. Recently launched query explain plan. pic.twitter.com/lX0C1bvfSv
— Juan Sequeda (@juansequeda) May 8, 2019

Bryon Jacob from data.world discussed how they sneak knowledge graphs into the users without them even knowing about it

#kgc2019 @bryonjacob from @datadotworld They designed their knowledge a graph data platform with a user-centric focus in mind. Users can ETL datasets or make a virtual connection. Query the data with SQL or SPARQL. Support for provenance. Everything underneath the hood is RDF. pic.twitter.com/EVqOkWWsLA
— Juan Sequeda (@juansequeda) May 8, 2019

Eloquent talk by Bryan Jacob from https://t.co/NgBYzeswWV — “a knowledge graph is the only way to deal with the complexity of enterprise data at scale” #KGC2019 pic.twitter.com/fUVkMadnFf
— Hanhan (@heipihanhan) May 8, 2019

Nasos Kyriakov from Ontotext shared a marketing intelligence use case

#kgc2019 @kiryakov_ak from @ontotext Discussing a use case for graphs: marketing intelligence to do M&A research (who bought who?) and relationship between companies. Pretty cool demo. Very powerful when you integrate data, create a knowledge graph and run graph algorithms pic.twitter.com/YjLne1qIfR
— Juan Sequeda (@juansequeda) May 8, 2019

The grand finale was a genuine and honest discussion between all the vendors which I had the honor to moderate.

Now THAT’s a panel… @datadotworld @data_chemist @StardogHQ @OntotextGraphDB @TigerGraphDB @Franzinc @neo4j @DataStax @aws @GavinMGleason @jansaasman @bryonjacob @mikegrovesoft @kiryakov_ak @b2ebs @jeromatron @ArthurAKeen JesusBarassa @juansequeda #KGC2019 pic.twitter.com/FFPENmP9ad
— Thomas Deely (@DeelyThomas) May 8, 2019

My takeaway is that there is NOT a RDF graph vs Property Graph “battle”. It was agreed that if your goal is to share data, then use RDF. But that doesn’t stop you from using a property graph. Jesus was very emphatic that you can use Neo4j as their storage model and still support RDF (probably not natively) from Neo4j. Jeremy from Datastax shared that with the upcoming Tinkerpop 4 you can compile anything into the internals of tinkerpop, let it be Cypher or SPARQL. Amazon supports both because their customers want both.

“I don’t care about what infrastructure you are using, as long as you have unique identifiers to entities and adhere to the culture of open data sharing” @jansaasman take on RDF graph v.s. property graph #kgc2019
— Hanhan (@heipihanhan) May 8, 2019

However, some of the RDF folks are more “pedantic” like Stardog and Datachemist. Finally, Datachemist is proposing a new graph language which has features that have been well defined in G-CORE and are going into GQL.

I asked everybody to give a 2 floor elevator pitch to convince the audience that they should spend their time evaluating their technology. Basically everybody’s response was the same: just sign up/download our system and try it out.

My takeaway from the panel: we are turning into a fuzzy open warm comfortable graph community. Confirms my takeaway from the W3C Graph workshop .

Final Thoughts

Word on the street is that people really regret not attending this event.
Congrats to all the organizers: Francois, Thomas, Will and all the students collaborators. You all ran an impeccable event!
Kudos to all the speakers who stayed within the 20 minute slots for their talks.
Even though the majority were new faces, it was great to see old timers like Dieter Fensel, Dean Allemang and Sören Auer, renowned figures in the semantic web community
Check out all the #kgc2019 tweets
Beautiful location and the weather was PERFECT!
Check out Denny’s trip report
Check out Vivek’s trip report
Talks were recorded! It will take a while but they will be made public. So stay tuned!
See you May 2020 back in NY!

Finally, check out what some of the attendees had to say

When you listen to these people, that come from different industries, explain and showcase the power of semantic web technologies it's difficult to understand how and why people are still sceptical. #kgc2019 https://t.co/zPFV8cO1Wn
— violeta 🕊️ (@azraiekv) May 7, 2019

So good to meet so many knowledge graph siblings separated at birth, living in other industries at @KnowledgeGraph2. Great work @DeelyThomas & @lechatpito! Looking forward to the next one, #kgc2019
— Tom Plasterer (@TPlasterer) May 8, 2019

Francois Schaffre @lechatpito closing #kgc2019. Wonderful start! The best event on #KnowledgeGraph. Bravo, Francois and team! @KnowledgeGraph2 pic.twitter.com/aG10ANL0U4
— Atanas Kiryakov (@kiryakov_ak) May 8, 2019

Thanks to the organisers, speakers and sponsors of the @KnowledgeGraph2 , it exceeded my expectations of the conference and I definitely look forward to the 2020 edition! #kgc2019
— Christophe Willemsen (@ikwattro) May 8, 2019

Fantastic conference. Well done @DeelyThomas & @lechatpito! #kgc2019 https://t.co/w16pMVLBim
— DataChemist (@data_chemist) May 8, 2019

5 Replies to “2019 Knowledge Graph Conference Trip Report”

Pingback: Knowledge Graphs are the new Black. The Year of the Graph Newsletter, May 2019 - ExamsWorld
Mark Underwood says:

June 10, 2019 at 7:52 pm

Great summary, thanks

Pingback: What We're Reading - Fundraising Using AI, VC Uses Industry Network to Find Deals, and More
charels says:

July 17, 2019 at 1:26 am

great report, I’d like to know where we can get the slides of these talks, thanks a lot.

1. Juan says:
  
  July 17, 2019 at 3:41 am
  
  You can find the videos on youtube. This is the link to my talk: https://www.youtube.com/watch?v=JohxmsHE4dI