2019 Knowledge Graph Conference Trip Report

I just got back from the 2019 Knowledge Graph Conference organized by the School of Professional Studies of Columbia University and chaired by François Scharffe. I was honored to be invited by François to be a member of the Program Committee. Our task was to invite speakers, go over the submitted proposals and help shape the program of this event. 

As always, I asked myself: what does success look like? If we learn what are the real world problems that various industries are tackling with Knowledge Graphs, and how they are achieving it. Additionally, for the skeptics to leave less skeptical and eager to engage more with Knowledge Graphs. 

I can report that, per my definition, this was a successful event! It actually surpassed my expectations. The event was packed and all 200 tickets were sold out. 

My main takeaways: 1) Finance is all over Knowledge Graphs, 2) more and more industries are now starting to pay attention, 3) roadblocks are social, not technical (the technology works!) and 4) virtual knowledge graphs are gaining a lot of interest (keep the data where it is).

Before I dive into details of this trip report, I believe it is paramount to highlight a problem that was observed by many: the lack of gender diversity. This was also observed in the past W3C Graph Data Workshop. We have a vibrant graph community, but why are we lacking gender diversity in this graph community? While we were creating the program, we invited many female speakers. Unfortunately, some couldn’t make it and some cancelled at the last minute.  It was great to see a larger female representation within the audience. A diverse group brings diverse ideas and fosters increased creativity. As a community, we need to make sure that all voices are included. This lack of diversity worries me tremendously. We need to support the community at large and encourage people from all diverse backgrounds to participate and speak at next year’s Knowledge Graph Conference (yes, this event will take place next year!).

This event did have a diversity of industries attending. The talks and break discussions were very broad. I’m going to organize this report by the following topics: Finance, Other Industries, Unicorns, Virtual Graphs, Vision and Vendors. 

Finance

Given that we were in New York, there was a great representation of financial services companies.

The day was kicked off with a talk by Christos Boutsidis from Goldman Sachs.

It takes 1 week for them to construct the graph (if they are lucky) and 1 day to update the graphs with deltas. They infer and extract knowledge from the graph by running a series of standard graph algorithms:  Edge weights to understand how strong is a relation between a client and employee, Vertex centrality (i.e. Pagerank) to identify influencers, Vertex similarity to match Marcus applicants with politically exposed people, All pairs all paths to find Shortest paths to connect people with the firm and Community detection (clustering) to find set of accounts that participate in the same money transfer. The compliance applications are Insider threat, Insider trading, AML, Marcus Lending/Banking and Co-branded credit card. 

The next talk was by Patricia Branum and Bethany Sehon from Capital One. Their goal was to attach an ontology to their existing Customer 360 data in order to enhance definitions, standardized metadata and then further improve the metadata. 

When asked, how did they get sponsorship and internal buy in, it was an easy sell within Capital One because they see themselves as a data-driven company (shouldn’t everybody be one?!!). Given that their sponsors were in risk management, which deals with a lot of data, it was easy to fund the pilot. Capital One is planning to take this into production. They are also looking into reasoning.

What were their challenges? It wasn’t technical, it was social (I topic that I discussed during my talk)

I also really liked their definition of an ontology (and see the replies to my tweet to see other interesting discussions)

David Newman from Wells Fargo, a long timer also presented.

Tim Baker from Refinitiv (formerly Thomson Reuters Financial & Risk) presented their Knowledge Graph used to track bad actors.

Vivek Khetan from Accenture discussed on combining knowledge graph and NLP to understand regulatory press releases 

It’s been known for a while that the financial industry has been using semantic/graph technology for a while. But why has it been taking so long? I think Dean Allemang‘s First mover slide below sums it up:

More Industry Real World Use cases

Joe Pindell from Pitney Bowes and Colin Puri from Accenture jointly presented a customer service use case. With their knowledge graph they are 1) providing context and guidance, 2) discovering resolutions via relationships and 3) modeling & merging data views.

Lambert Hogenhout from the United Nations shared with the audience the reasons why the UN needs knowledge graphs.

The UN also needs to deal with many multilingual issues. They are just starting out.

Chris Brockmann from Eccenca discussed how Knowledge Graphs are used to integrate data in supply chain and provided a great ROI.

Tom Plasterer from AstraZeneca discussed that their main challenges is that data is all over the place. Their approach is to build a knowledge graph following the FAIR principles.

Parsa Mirhaji from Montefiore Hospital discussed on how it is still challenging to do analytics with health data

Steven Gustafson from MAANA shared their experience of creating knowledge graphs in the oil and gas industry. The popular term in this industry is Digital Transformation and he provided an interesting definition: Knowledge Graph + Function Graph = Digital Transformation, where a function graph is a graph of methods (ie functions) and how they interact between each other.

Unicorns

By unicorns I mean, companies that are very different from the mainstream (not everybody is a Google). It was very excited to have representatives from Airbnb, Amazon, Diffbot, Uber and Wikidata. 

Xiaoya Wei from Airbnb presented their knowledge graph:

They built the infrastructure from scratch. From a storage and data partitioning perspective, the nodes and edges are stored separately, by source. Node schema and edge payload are defined by thrift binary. It is horizontally scalable. The goal is to avoid broadcast for queries with large fan out. From a query perspective, the objective is to traverse a subgraph and retrieve nodes and edges from the traversal. Data is being ingested via an asynchronous framework to continuously import data. Diffs are calculated and then published on Kafka. Finally, why did they build the infrastructure from scratch? Because they built upon the infrastructure that they currently support internally (i.e. they don’t want to bring in more software and have to support it).

Three use cases were discussed: 1) Navigation via a taxonomy that describes the inventory, 2) recommendation and 3) provide more context. 

Data quality and consistency is a key challenge. A human team checks data quality. That is why access control is important for them because a user can only make changes to the data that they know. 

Subhabrata Mukherjee from Amazon (now at Microsoft Research) discussed how the Amazon Product Graph is being built. 

Human in the loop techniques are required to clean up noisy training labels. Additionally, the information extraction system return a triples of strings, therefore the strings needs to map to concepts (things not strings!) in order to truly integrate the knowledge.

Even though Diffbot is a startup, I’m putting them in the unicorn category because they are doing something very unique that not everybody needs to do: create a knowledge graph by crawling the web. Effectively, they are competing against Google and offering services that Google doesn’t. Mike Tung, CEO of Diffbot presented:

Great quote:

Josh Shinavier described lessons learned from creating a Knowledge Graph at Uber.  Josh also confirmed Airbnb’s comments about why they built their own infrastructure from scratch: they want to reuse the support capabilities that they already have and not bring new software into the mix.

For more details, check out the article Uber’s graph expert bears the scars of billions of trips.

Finally, Denny Vrandecic from Google AI talked about Wikidata. Check out Vivek Khetan’s twitter thread on Denny’s talk.

Virtual Graphs

Capital One, AstraZenca, Uber and Wells Fargo all publicly stated that they are looking into virtual graphs. This means, they want to be able to keep the data in its original source and have a way to virtualize it as a Knowledge Graph.

This is music to my ears because this is what my PhD was all about and on the premise for which Capsenta was founded: a NoETL (i.e. virtualize) approach to data integration via semantic/graph technology.

I had a lot of discussions during the breaks with other folks about this topic. There is an agreement that moving the data to a centralized location has been the status quo and it’s getting more and more expensive. I’m also glad to see other vendors talking about virtualization such as data.world and Stardog.

Machine Learning

Subhabrata Mukherjee’s talk provided a lot of details into their machine learning process. Take a look at Vivek’s twitter thread.

Alfio Gliozzo from IBM Research discussed how to extend Knowledge Graphs using Distantly Supervised Deep Nets. The challenge: develop hand labeled data. There is an agreement with the ML folks in the audience. Vivek also has a detailed thread on this talk.

Freddy Lecue from Thales discussed explainable AI. 

Vision

Given the hype of Machine Learning, Deep Learning, AI, etc, I’ve been asking myself if we will ever automate the creation of Knowledge Graphs. I had a great discussion with Subhabrata Mukherjee on this topic. He thinks that we will get there assuming the source of data is unstructured because there is so much overlapping data within the same domain. On the other hand, when the source is structured data, we both agreed that the future doesn’t look bright. There simply isn’t enough overlapping domain data. As I mentioned in my talk, I never thought that I would be working on methodologies because we need to empower humans and machines to work together.

We were very lucky to have Pierre Haren,  a pioneer in AI and rule systems and founder of ILOG. He spoke about the future evolution of knowledge graphs to casual graphs where the relationships (edges) are causal. 

Personally, I was thrilled to finally meet him and get his input for our upcoming tutorial on the History of Knowledge Graph’s Main Ideas at ISWC2019.

Vicky Froyen discussed where Collibra is heading.

Vendors

We had representatives from many vendors: AllegroGraph/Franz, Amazon Neptune, Data Chemists, Datastax, data.world, GraphDB/Ontotext, Neo4j, Stardog, TigerGraph and yours truly, Capsenta!

I gave a 20 min version of my talk Designing and Building Enterprise Knowledge Graphs from Relational Databases in the Real World (which is an evolution of my previous talk on Integrating Semantic Web in the Real World: A Journey between Two Cities ). I’m happy to share that the talk was very well received. Check out this twitter thread.

I was also very thrilled to give demos of Gra.fo, our visual collaborative and real-time knowledge graph schema editor. I love seeing the faces of people when they see Gra.fo for the first time. I am so proud of the entire Capsenta team for developing Gra.fo!

Nav Mathur from Neo4j discussed how they build knowledge graphs

Jesús Barrasa shared an objective comparison between RDF Graphs and Property Graphs (more later)

Brad Bebee shared lessons learned from Amazon Neptune’s customers

Bryon Jacob from data.world discussed how they sneak knowledge graphs into the users without them even knowing about it

Nasos Kyriakov from Ontotext shared a marketing intelligence use case  

The grand finale was a genuine and honest discussion between all the vendors which I had the honor to moderate.

My takeaway is that there is NOT a RDF graph vs Property Graph “battle”. It was agreed that if your goal is to share data, then use RDF. But that doesn’t stop you from using a property graph. Jesus was very emphatic that you can use Neo4j as their storage model and still support RDF (probably not natively) from Neo4j. Jeremy from Datastax shared that with the upcoming Tinkerpop 4 you can compile anything into the internals of tinkerpop, let it be Cypher or SPARQL. Amazon supports both because their customers want both.

However, some of the RDF folks are more “pedantic” like Stardog and Datachemist. Finally, Datachemist is proposing a new graph language which has features that have been well defined in G-CORE and are going into GQL.

I asked everybody to give a 2 floor elevator pitch to convince the audience that they should spend their time evaluating their technology. Basically everybody’s response was the same: just sign up/download our system and try it out.

My takeaway from the panel: we are turning into a fuzzy open warm comfortable graph community. Confirms my takeaway from the W3C Graph workshop . 

Final Thoughts

  • Word on the street is that people really regret not attending this event.
  • Congrats to all the organizers: Francois, Thomas, Will and all the students collaborators. You all ran an impeccable event!
  • Kudos to all the speakers who stayed within the 20 minute slots for their talks.
  • Even though the majority were new faces, it was great to see old timers like Dieter Fensel, Dean Allemang and Sören Auer, renowned figures in the semantic web community
  • Check out all the #kgc2019 tweets
  • Beautiful location and the weather was PERFECT!
  • Check out Denny’s trip report 
  • Check out Vivek’s trip report
  • Talks were recorded! It will take a while but they will be made public. So stay tuned!
  • See you May 2020 back in NY!

Finally, check out what some of the attendees had to say

Gra.fo, six months later! What have we been up to?

I can’t believe that it has already been six months since we first announced Gra.fo. Time flies when you are having fun! I am really excited to share with everybody some of the major features we have been up to: New Exports, Graph Schema Documentation, Multi-select and Import Mapping.

New Exports

It was clear to us from the beginning that Gra.fo was in the position to support both RDF Graph and Property Graph communities. We started out by exporting the schemas as OWL ontologies in Turtle and RDF/XML syntaxes. However, we lacked support for Property Graph schemas.

Throughout the past few months, we have been thrilled to see the interest of the Property Graph community in schemas and Gra.fo. (I’m honored to be chairing the Property Graph Schema Working Group within the context of the GQL standardization effort.)

That is why we are excited to announce three new property graph schema export formats:

There is a clear need for a general purpose graph schema modeling tool. We are lucky to have this opportunity where Gra.fo can be a bridge between both graph communities.

Graph Schema Documentation

Exporting the graph schema to a PNG or SVG image sure is pretty may not be sufficient. The image does not show attributes or detailed descriptions.

An important need is to provide documentation about the graph schema in a way that can be easily consumed by humans. This type of documentation can serve as requirement documentation, project deliverable, etc.

Now you are able to view the documentation of the graph schema in a separate page. Go to File > Graph Documentation.

The documentation has its own URL of the form https://app.gra.fo/documentation/a1b2c3 You can now easily share that link with others who also have permission to view the document.

Need a PDF? Simply print and save as PDF.

Multi-select

What if you need to move multiple concepts at the same time? Before, you would need to move each one independently. That was very annoying.

Not any more! Now you can select multiple concepts at the same time and move them all at once. And it even works in real-time when you have multiple users on the document.

Simply select multiple concepts by pressing shift on your keyboard and clicking on each concept that you want to move. Additionally you can click on the canvas while pressing shift on your keyboard and then drag/drop to create a bounding box.

In addition to moving multiple concepts at once, you can also change the colors and delete them.

Import Mapping

Designing a graph schema is just the first part. You have to do something with it. Our customer’s common use case is data integration. Their need is to map complex source relational databases into the graph schema which models the business users view of the world.

One way of representing these mapping is using the W3C’s R2RML: Relational Databases to RDF Mapping Language. This standard was ratified back in 2012, together with the Direct Mapping standard (I am one of the editors).

R2RML is a declarative language that defines how RDF triples are generated from SQL tables or queries. For example, the following R2RML snippet defines that all the rows of the OMS_ORDER table will be instances of a class ec:Order and that the subject of the triples are defined by that template which uses the values from the attribute OrderId.

@prefix rr:    <http://www.w3.org/ns/r2rml#> .
@prefix map:   <http://capsenta.com/mappings#> .
@prefix ec:    <http://gra.fo/e-commerce/schema/> .

map:Order  a rr:TriplesMap ;
rr:logicalTable  [ rr:tableName "OMS_ORDER" ] ;
rr:subjectMap    [ rr:class ec:Order ;
rr:template "http://www.e-commerce.com/data/order/{OrderId}"
] .

These mappings can be created using editors, or by hand if you are an RDF geek 🙂. Capsenta offers Ultrawrap Mapper, our mapping management systems.

I’m really excited about this initial feature: import an existing R2RML mapping to a graph schema document. Go to Mapping > Manage

Once a mapping has been imported, you will see an icon on the left panel if a mapping exists for a Concept, Attribute or Relationship

If you click on the icon, you will see the mapping details. In this example, we are showing the previous R2RML snippet.

In real-world, enterprise relational databases, the mappings will consist of complex SQL queries defined in an R2RML mapping, as the following example shows:

Once you have the mappings, you have to do something with the mappings. You can use the mappings to physically convert the relational data into graphs (ETL) or virtualize the relational databases as if it were a graph database (NoETL). Capsenta also offers Ultrawrap Data Integrator where you can use mappings in an ETL or NoETL mode, or even a hybrid.

So what’s next?

There are a lot more exciting features coming soon.

  • Gra.fo/Mapper: Importing a mapping is just the beginning. We want you to be able to create your mappings all from within Gra.fo.
  • API: We want to empower users to create their own apps that interact with Gra.fo. Everything that you can do through the frontend, you will also be able to accomplish through an API.
  • Gra.fo Documentation: We have a phenomenal UI/UX team who strive to make Gra.fo very intuitive. Nevertheless, we acknowledge the necessity of having documentation.

We truly appreciate all the feedback that we have been getting from our users. Please keep it coming!