SIGMOD 2020: A Virtual Trip Report

This past week was the SIGMOD conference. I’ve always known that there is a lot of overlapping work between the database and semantic web community so I have been attending regularly since 2015. The topics I’m most interested in are on data integration and graph databases, which is the focus of this trip report.

I was really looking forward to going to Portland but that was not possible due to Covid. Every time I attend SIGMOD I get to know more and more people so I was a bit worried that I wouldn’t get the same experience this year. This virtual conference was FANTASTIC. The organizers pulled off a phenomenal event. Everything ran smoothly. Slack enable a deep and thoughtful discussion because people spent time organizing their thoughts. There were social and networking events. Gather was an interesting experience to simulate the real world hallway conversations. The Zoomside chats were a AMA (ask me anything) with researchers on different topics. The Research Zoomtable were relaxed discussions between researchers about research topics. The panels on Women in DB, The Next 5 Years and Startups generated a lot of discussion afterwards on slack. Oh and how can we forget the most publicized and popular event of all: Mohan’s Retirement Party.

Science is a social process. That is why I value so much conferences because it is the opportunity to reconnect with many colleagues, meet new folks, discuss ideas, projects, etc. The downside of a virtual conference is that your normal work week continues (unless you decide to disconnect 100% from work during the week). I truly hope that next year we will be back to some sense of normality and we can meet again f2f.

My takeaways from SIGMOD this year are the following:

1) There is still a gap between real world and academia when it comes to data integration. I personally believe that there is science (not just engineering) that needs to be done to bridge this gap.
2) Academia is starting to study data lakes and data catalogs. There is a huge opportunity (see my previous point).
3) There is interest from academia to come up with novel interfaces to access data. However will just be an academic exercise with very little real world impact if we don’t understand who is the user. To do that, we need to connect more with the real world.
4) Graphs continue to gain more and more traction in industry.

I’m very excited that this community is looking into the needs and features of data catalogs because this is topic dear to my heart because I am the Principal Scientist at data.world, which is the only enterprise data catalog that is cloud-native SaaS with Virtualization and Federation, powered by a Knowledge Graph.

RESEARCH

There was a very interesting slack discussion about research and the “customer” that was sparked after the panel “The Next 5 Years: What Opportunities Should the Database Community Seize to Maximize its Impact?”.

AnHai Doan commented that the community understands the artificial problems in the research papers instead of understand the real problems that customers face. Therefore there is a need to identify the common uses cases (not corner cases) that address 80% of customers needs and own those problems and own them.

To that, Raul Castro Fernandez pointed out that systems work is disincentivized because reviewers always come back with “just engineering.” Personally, if there is a clear hypothesis and research question, with experiments that provide evidence to support the hypothesis, then the engineering is also science. Otherwise, it is engineering.

Joe Hellerstein chimed in, with spot on comments, that are not worth summarizing, so here they are verbatim:

I would never discourage work that is detached from current industrial use; I think it’s not constructive to suggest that you need customers to start down a line of thinking. Sounds like a broadside against pure curiosity-driven research, and I LOVE the idea of pure curiosity-driven research. In fact, for really promising young thinkers, this seems like THE BEST reason to go into research rather than industry or startups

What I tend to find often leads to less-than-inspiring work is variant n+1 on a hot topic for large n. What Stonebraker calls “polishing a round ball”.

Bottom line, my primary advice to folks is to do research that inspires you.

“...if you are searching for relevance, you don’t need to have a friend who is an executive at a corporation. Find 30-40 professionals on LinkedIn who might use software like you’re considering, and interview them to find out how they spend their time. Don’t ask them “do you think my idea is cool” (because they’ll almost always say yes to be nice). Ask them what they do all day, what bugs them.  I learned this from Jeff Heer and Sean Kandel, who did this prior to our Wrangler research, that eventually led to Trifacta. It’s a very repeatable model that simply requires different legwork than we usually do in our community.

– Joe Hellerstein in a slack discussion

DATA INTEGRATION

Most of the data integration work continues to be on the topic of data cleaning and data matching/entity matching/entity resolution/…. Makes sense to me because this is an area where there are opportunities to continue automating because there is a lot of data. The following papers are on my to-read list:

Given how data lineage is an important feature of data catalogs, I was keen to attend the Provenance session. At data.world, we represent data lineage as provenance using PROV-O. Unfortunately I missed it and was able to catch the tail of the Zoomtable discussion. My biased perception is that the academic discussions on provenance are disconnected from reality when it comes to data integration. I shared the following with a group of folks: “From an industry perspective, data lineage is something that companies ask for from data integration/catalog/governance companies. The state of the art in the industry is to extract lineage from SQL queries, stored procedures, ETL tools and represent this visually. This can now be done. Not much science here IMO. There is a push to get lineage from scripts/code in Java, Python. What is the academic state of the art of reverse engineering Java/Python/… code used for ETLing?“.

Zach Ives responded that there has been  progress in incorporating UDFs and certain kinds of ETL operations, with human expertise incorporated, but he wasn’t aware of doing this automatically from Java/Python code.

Join this webinar if you are interested in learning about how we are using data lineage to tackle complex business problems.

I was pointed to the following that I need to dig in

As AnHai noted on a slack discussion, there is still a need to bridge the gap between academia and the real world. Quoting him:

For example, you said “we understand the problems in entity matching and provenance”. But the truth is: we understand the artificial problems that we define for our research papers. Not the real problems that customers face and we should solve. For instance, a very simple problem in entity matching is: develop an end-to-end solution that uses supervised ML to match two tables of entities. Amazingly, for this simple problem, our field offers very little. We did not solve all pain points for this solution. We do not have a theory on when it works and when it doesn’t. Nor do we have any system that real users can use. And yet this is the very first problem that most customers will face: I want to apply ML to solve entity matching. Can you help me?

– AnHai Doan in a slack discussion


DATA ACCESS

I’m observing more work that intends to lower the barrier for accessing data via non-SQL interfaces such as natural language, visual and even speech! The session on “Usability and Natural Language User Interfaces” was my favorite one because the topics were “out of the box” and “curiosity-driven”. I am very intrigued by the QueryVis work to provide diagrams to understand complicated SQL queries. I think there is an opportunity here, but the devil is in the details. The SpeakSQL paper sparked a lot of discussion. Do we expect people in practice to dictate a SQL query? In the Duoquest paper, the researchers combine the Natural Language Interface approach with Programming-by-example, where a user provides a sample query result. I’ve seen this PBE approach over and over in the literature, specifically for schema matching. At a glance it seems an interesting approach but I do not see the real world applicability… or at least I’ve never been exposed to a use case where the end user has and/or is willing to provide a sample answer. However, I am be wrong about this. This reminds my of AnHai’s comments on corner cases but at the same time, this is curiosity-driven research.

Papers on my to-read list:

Another topic very dear to me is data catalogs. Over the past couple of years I’ve been seeing research on the topics of dataset search/recommendation, join similarity, etc., that are important features for data catalogs. I’m looking forward to digging into these two papers:

On this topic of data lakes, I’m really really bummed that I missed several keynotes:

I can’t wait to watch the videos.

GRAPHS

If you look at the SIGMOD research program, there are ~20 papers on the general topic of graphs from ~140 research papers, plus all the papers from the GRADES-NDA workshop. The graph work that I was most attracted came from industry: Alibaba, Microsoft, IBM, TigerGraph, SAP, Tencent, Neo4j.

I found it intriguing that Alibaba and Tencent are both creating large scale knowledge graphs to represent and model common sense of their users. Cyc has been on it for decades. Many researchers believe that this is the wrong approach. But then 10 years ago schema.org came out as a high level ontology that the web content producers are adhering too. Now we are seeing these large companies creating knowledge bases (i.e. knowledge graphs) that integrates not just knowledge and data at scale, but also common sense. Talk about “what goes around comes around.”

Every year that I attend SIGMOD, it is a reminder that the database and semantic web community must talk to each other more and more. Case in point: IBM presented DB2 Graph where they retrofit graph queries (Property Graph model and Tinkerpop) on top of relationally-stored data. I need to dig into this work, but I have the suspicion that it overlaps with work from the semantic web community. For example, Ultrawrap, Ontop, Morph, among others, are systems that execute SPARQL graph queries on relational databases (note: Ultrawrap was part of my PhD, foundation for my company Capsenta which was acquired by data.world last year). There are even W3C standards to map Relational Data to RDF Graph (i.e. Direct Mapping, R2RML). Obviously the focus of the semantic web community has study these problem from the perspective of RDF Graphs and SPARQL. Nevertheless, it’s all just a graph so the work is overlapping.  In the spirit of cross-communication, I was thrilled to see Katja Hose‘s keynote at GRADES-NDA where she presented work from the semantic web community such as SPARQL Federation, Linked Data Fragments, etc.

Another topic that was brought up by Semih Salihoglu was the practical uses of graph analytics algorithms. This discussion was sparked by the paper “Graph Based Benchmark Suite.” It was very neat to know that Neo4j has actually started to categorize the graph algorithms that are used in practices. In their graph data science library, algorithms exist within three tiers: Production-quality, beta and alpha. These tiers serve as proxies for what is being used in the real world.

Papers on my to-read list:

WHO IS THE USER?

A topic that came up in the “Next 5 Years” panel was the need for results to be “used in the real world” and for tools to be “easy to us”. This is inevitable in research because the opposite would be a falsehood (do research so it’s used in an artificial world and hard to be used). I believe that a missing link between “used in the real world” and “easy to use” is to understand the USER. I also believe it is paramount that the database research community understands who are the users in the real world. It’s not just data scientist. We have data engineers, data stewards, data analyst, BI developers, Knowledge scientist, Product Managers, Business User, etc. I believe that we need to look at the data integration and the general data management problem not just from a technical point of view (which is what the database community has been doing for 20+ years), but from a social aspect: understanding the users, processes and how they are connected using end-to-end technology solutions. This takes us out of our comfort zone, but this is what is going to push the needle in order to maximize the input.

For the past year, I’ve been advocating to research the phenomena of data integration from a socio-technical angle (see my guest lecture at the Stanford Knowledge Graph course), provide methodologies to create ontologies and mappings and the new role of the Knowledge Scientist.

Joe Hellerstein provided another great comment during our slack discussion:

Building data systems for practitioners who know the data but not programming (current case in point—public health researcher) is a huge challenge that we largely have a blindspot for in SIGMOD. To fix that blindspot we should address it directly. And educate our students about the data analysis needs of users outside of the programmer community.

– Joe Hellerstein in a slack discussion

While watching the presentations of the “Usability and Natural Language User Interfaces” session, I kept asking me: who is the user?What are the characteristics that define that user? Sometimes this is well defined, sometimes it is not. Sometimes it is connected with the real world, sometimes it is not.

The HILDA workshop and community is addressing this area and I’m very excited to get involved more. All the HILDA papers are on my to-read list. I’m leaving with a very long list of papers to read and new connections.

Thanks again to the organizers for an amazing event. Can’t wait to see what happens next year.

Additional notes:

Oh, and a final reminder to students:

International Semantic Web Conference (ISWC) 2019 Trip Report

My takeaway from this year’s ISWC 

  • Less is sufficient
  • Theory and practice is happening more and more and it’s getting rewarded
  • We need to think bigger
  • Semantic and Knowledge Graph technologies are doing well in industry

So let’s start! This was a very exciting and busy ISWC for me. 

For the past two years, Claudio Gutierrez and I have been researching the history of knowledge graphs (see http://knowledgegraph.today/). We culminated this work with a paper and a tutorial at ISWC. It was very well received:

I also gave the keynote “The socio-technical phenomena of data integration” at the Ontology Matching Workshop

Part of my message was to push the need of the Knowledge Scientist role.

I gave a talk on our in-use paper “A Pay-as-you-go Methodology to Design and Build Enterprise Knowledge Graphs from Relational Databases” 

In conjunction with Dave Griffith, I also gave an industry presentation on how we are building a hybrid data cloud at data.world. Finally, I was also on an industry panel.

Oh, and data.world had a table

And our socks were a hit

Let’s not forget about Knowledge

Jerome Euzenat’s keynote was philosophical. His key message was that we have gotten too focused with data and we are forgetting knowledge and how knowledge evolves. 

I agree with him. In the beginning of the semantic web, I would argue that the focus was on ontologies (i.e. knowledge). From the mid 2000s, the focus shifted to data (Linked Data, LOD) and that is where we have been. We should not forget about knowledge. And it’s because of this:

I would actually rephase Jerome’s message and say that we should not just forget about knowledge, but we should not forget about combining data and knowledge at scale. 

A don’t forget:

Outrageous Idea

There was a track called Outrageous Idea and the outrageous issue was that most of the submissions were rejected because they weren’t considered outrageous by the reviewers. This lead to an interesting panel discussion. 

The semantic web community has a track record of being outrageous:

  • The idea of linking data on the web was crazy and many thought it would not happen. 
  • Even though SPARQL is not a predominant query language, one of the largest repositories of knowledge, Wikidata, is all in RDF and SPARQL. 
  • Querying the web as if it were a database was envisioned in the early 90s, and given the linked data on the web, it actually became possible (see Olaf Hartig PhD dissertation and all the work that was spawned. No wonder his 2009 paper received the 10 year prize this year. Congrats my friend!). 
  • Heck, the semantic web itself is an outrageous idea (that hasn’t yet been fulfilled). 

However, there is a sentiment that this community is stuck and focused on incremental advances. Something needs to change. For example, we should have a venue/track to publish work that may lack a bit of scientific rigor because it is visionary, may not have a well defined research questions or clearly stated hypothesis (because we are dreaming!), or the evaluation is preliminary/lacking because it’s still not understood how to evaluate or what to compare to. Rumor has it that there will be some sort of a vision track next year. Let’s see!

Pragmatism in Science

It was great to see scientific contributions combining theory and implementation, thus being more pragmatic. A catalyst, in my opinion, was the Reproducibility Initiative. Several papers had a “Reproduced” tag to note that the results were implemented, the code was available and that a third party reproduced the results. One of the best research paper nominees, “Absorption-Based Query Answering for Expressive Description Logics” won the Best Reproducibility award. These researchers are well respected theoreticians, and it’s very interesting to see how they are interested in bridging their theory with practice.  

The best research paper, “Validating SHACL constraints over a SPARQL endpoint” which is highly theoretical, also has experimental results and made their code available:  SHACL2SPARQL

I’m seeing this trend also in the database community. For example, the Graph Query Language (GQL) for Property Graphs standardization process, will be accompanied by a definition of formal semantics, which is being led by theoreticians including Leonid Libkin.

I’m also starting to see this interest the other direction: researchers who focus more on building systems are being more rigorous with their theory and experiments. For example, the best student research paper was “RDF Explorer: A Visual SPARQL Query Builder” (see rdfexplorer.org). The computer science team partnered with an HCI researcher to make a user study and providing scientific rigor to their work (and ultimately getting nominated and winning an award). 

Bottomline, it’s my perception that the theoritians want to make sure that their theory is actually used, and systems builders are focusing more and more on the science and not just the engineering. This is FANTASTIC!

Table to Knowledge Graph Matching 

One of the big topics at the conference was the “Tabular Data to Knowledge Graph Matching” challenge. The challenge consisted of three tasks:

  • CTA: Assigning a class (:Actor) from a Knowledge Graph to a column
  • CEA: Matching a cell to an entity (:HarrisonFord) in the Knowledge Graph 
  • CPA: Assigning a property (:actedIn) from the Knowledge Graph to the relationship between two columns 

The matching was to DBpedia. The summary of the challenge in one slide: 

For example the team from USC, Tabularisi, at a high level, created candidate matches by using DBpedia Spotlight and used TF-IDF and that was sufficient to get decent results. 

The winner of the challenge, Mtab, in my opinion, over-engineered their approach for DBpedia, which is how they were able to win the challenge. 

DAGOBAH, from Orange Labs had two approaches. The first baseline that used DBpedia Spotlight and compared it against a sophisticated approach using embeddings. The embedding approach was slightly better, but more expensive. 

There were other approaches such as CSV2KG and MantisTable:

My takeaway: “less is sufficient.” Seems like we can get sufficient quality by not being too sophisticated. In a way, this is good. 

More notes

Olaf Hartig gave a keynote at Workshop on Querying and Benchmarking the Web of Data (QuWeDa). His message: 

Albert Meroño ‏presented work on modeling and querying lists in RDF Graphs ( Paper. Slides ). Really interesting 

I really need to check out VLog, a new rule based reasoner on Knowledge Graphs  VLog code. Java library based on the VLog rule engine Paper

SHACL and Validation

OSTRICH is an RDF triple store that allows multiple versions of a dataset to be stored and queried at the same time. Code. Slides.

Interesting to see how the Microsoft Academic Knowledge Graph as created in RDF. http://ma-graph.org/

An interesting dataset, FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation https://foodkg.github.io/index.html

Translating SPARQL to Spark SQL is getting more attention. Clever stuff in the poster: Exploiting Wide Property Tables Empowered by Inverse Properties for Efficient Distributed SPARQL Query Evaluation

There wasn’t so much material on Machine learning and embeddings, only one session (not surprising because I guess that type of work gets sent to Machine learning conferences). The couple of things I saw (not complete):

Need to check out http://ottr.xyz/ 

I missed the GraphQL tutorial

Industry

Even though this is an academic/scientific conference, there was still a bit of industry attendees. 

Dougal Watt (former IBM NZ Chief Technologist and founder of Meaningful Technology) gave a keynote, where he was preaching Dave McComb’s message of being data centric. I liked how he introduced the phrase “knowledge centric” which is where we should be heading.

Pinterest and Stanford won the best in-use paper award for  “Use of OWL and Semantic Web Technologies at Pinterest”

Bosch presented their use case of combining semantics and NLP. They are creating a search engine for material scientist to find documents. 

Google was present:

Joint work between Springer and KMi was presented

The Amazon Neptune presented a demo “Enabling an Enterprise Data Management Ecosystem using Change Data Capture with Amazon Neptune” and an industry talk “Transactional Guarantees for SPARQL Query Execution with Amazon Neptune

I learned about Ampligraph.org from Accenture, an “Open source library based on TensorFlow that predicts links between concepts in a knowledge graph.”

Great to see Orange Labs participating in the table to knowledge graph matching challenge (more above). 

Always great to connect with Peter Haase from Metaphacts and meet new folks like Jonas Almeida from NCI.

And that’s a wrap

ISWC is always a lot of fun. In addition to all the scientific and technical content, there is also a sense of community. I always enjoy being part of the mentoring lunch:

We had a fantastic gala dinner

And we even got all the hispanics together:

Take a look at other trip reports (I can now read them after I published mine!)

Avijit Thawani: https://medium.com/@avijitthawani/iswc-2019-new-zealand-bd15fe02d3d4

Sven Lieber: https://sven-lieber.org/en/2019/11/05/iswc-2019/

Cogan Shimizu: https://daselab.cs.ksu.edu/blog/is`wc-2019

Armin Haller: https://www.linkedin.com/pulse/knowledge-graphs-modelling-took-center-stage-iswc-2019-armin-haller/

With that, see you next year:

2nd U.S. Semantic Technologies Symposium 2019 Trip Report

I would summarize my experience at the 2nd US Semantics Technology Symposium as follows:

Frustrated but optimistic

First of all, I share my criticism with the upmost respect I have for everybody in this community. Furthermore, I acknowledge that this is my biased view based on the sessions and hallway conversations I had. Therefore, please take my comments with a grain of salt.

Frustration

Where is the excitement?

I left last year after the 1st US2TS very excited. It actually took me another week to process further the excitement. This year I simply didn’t feel the excitement. This was also echoed in the town hall meeting we had. I acknowledge that this is my feeling and it may not be shared by others. I was expecting to see people sharing their new research grants (per the previous year, there is a lot of NSF money), companies sharing what they are doing (there was a bit of this but not much), newcomers asking questions about how to bring semantic technologies into practice. All of this was missing. I think the community is stuck in the same ol’, same ol’.

Same ol’, Same ol’
A common theme I was hearing was the “Same ol’ same ol’”: we need better tools, we need to lower the barrier, it’s too hard to use, etc.  … Insert here a phrase often used towards someone who states the obvious….

This was also my takeaway from Deborah McGuinness’ keynote

This was probably my main source of frustration. I’ve been in this community for over a decade and I’ve been hearing the same thing for a decade. 

Where is the community?

Per the US2TS website “The goal of the U.S. Semantic Technologies Symposium series is to bring together the U.S. Semantic Web community and begin forming such a research network.” Given that this was the second edition, I was expecting that we would be seeing a community forming.  I did not feel that this was happening. Again, this is just my personal perception and others may disagree.

I did meet a few newcomers, and based on private conversations, I had the impression, and they also confessed, that a lot of the discussions were way over their head. Are we being open to the new comers? Do they feel welcome? I don’t think so. 

What is a Knowledge Graph? 

I understand that we need to define our terms, in order to make sure that we are talking about the same thing. But we have to be careful and not end up going down a black hole: 

All of this discussion was a reminder of the “drama” we went through in Dagstuhl on this same topic of defining what is a knowledge graph. As I mentioned in my Dagstuhl trip report:


Throughout the week, there was a philosophical and political discussion about the definition. Some academics wanted to come up with a precise definition. Others wanted it to be loose. Academics will inevitably come up with different (conflicting) definitions (they already have) but given that industry uptake of Knowledge Graphs is already taking place, we shouldn’t do anything that hinder the current industry uptake. For example, we don’t want people searching for “Knowledge Graphs” and finding a bunch of papers, problems, etc instead of real world solutions (this is what happened with Semantic Web). A definition should be open and inclusive. A definition I liked from Aidan Hogan and Antoine Zimmermann was “as a graph of data with the intention to encode knowledge”

Excerpt from Trip Report on Knowledge Graph Dagstuhl Seminar

I had the opportunity to provide “my definition” of knowledge graph and I did it in a controversial way

I find it funny/ironic that really smart academics are providing a definition for a marketing term that came up in a blogpost in 2012!

Optimistic

Now that I have shared my frustration, let me share my optimism.

It was confirmed over and over the semantic technologies do work in the real world. This was clearly exposed in Deborah’s and Helena’s keynote 

We do have newcomers who bring in a complete different perspective. 

The newcomers are bringing in lessons learned

This community is full of incredibly smart people. 

What should the 3rd edition of US2TS look like?

We need to provide elements to help form a community:

  • [UPDATE, idea after chatting with Anna Lisa Gentile] Given that there is a cap on attendees, in order to register, people should submit a position statement indicating 1) why they want to attend, 2) what they have to contribute and 3) what they expect to take away. This is what the W3C Workshop on Graph Data did and the conversations were very lively.
  • How about organizing a barcamp, with user-generated content on the fly.
  • Have a wall of ideas where people can post the topics they are interested.
  • Speed dating so people can find others that have similar interest.

We need more industry

  • I think we should strive to have 50/50 between Industry and Academia (I think it was 60% academia this time) .
  • Industry participants should have sessions explaining their pain points. 
  • Startup can share their latest developments and the help they may need.

We need an academic curriculum

  • If we already have a group of academics in the room, why not spend some time organizing an undergrad and post-grad curriculum for semantic technologies that can be shared. 

Even though I left frustrated, I’m optimistic that next year we can have an exciting event.

Final Notes

  • The event was very well organized. Kudos to all the organizers!
  • Duke University is very beautiful and the room we were in was very bright.
  • Should this event be rebranded to Knowledge Graphs?
  • Chris Mungall wrote a report
  • Folks appreciated my call for knowing our history

International Semantic Web Conference (ISWC) 2018 Trip Report

ISWC has been my go-to conference every year. This time it was very special for two reasons. First of all, it was my 10 year anniversary of attending ISWC (first one was ISWC2008 in Karlsruhe where I presented a poster that ultimately became the basis of my PhD research and also the foundational software of Capsenta). Too bad my dear friend and partner in crime, Olaf Hartig, missed out (but for good reasons!). I only missed ISWC2010 in Shanghai; other than that, I’ve attended each one and I plan to continue attending them (New Zealand next year!)

The other reason why this was a special ISWC is because we officially launched Gra.fo, a visual, collaborative, real-time ontology and knowledge graph schema editor, which we have been working on for over 2 years in stealth mode.


THE Workshop: This year at ISWC, I co-organized THE Workshop on Open Problems and Emerging New Topics in Semantic Web Technology. The goal was to organize a true workshop where attendees would actually discuss and get work done.

Let’s say that we may have been a bit ambitious but at the end it turned out very well. In the first part of the morning, everybody was encouraged to stand up and on the spot talk for a minute about their problem. We gathered 19 topics. The rest of the morning, we self organized into clusters and each group continued to discuss and finalized with a wrap-up.

The goal was to submit the problems to THE Workshop website. Looks like the attendees have not done their homework (you know who you are!). We had great feedback about this format and we will consider submitting it again for next year and improve the format.


VOILA: I’ve been attending the Visualization and Interaction for Ontologies and Linked Data (VOILA) Workshop for the past couple of years (guess why 🙂 ) and luckily I was able to catch the last part of it. My take away is that there is a lot of cool things going on in this area but the research problems that are being addressed are not always clear. Furthermore, prototypes are engineered and evaluated but it’s not clear who is this tool for. Who is your user? I brought this up in my trip report from last year. This community MUST partner with other researchers in HCI and Social Science in order to harden the scientific rigor. Additionally, there are cool ideas that would be interesting to see if there is commercial viability.


SHACL:  I attended the Validating RDF data tutorial by Jose Emilio Labra Gayo. I came in trying to find an answer to the following question: Is SHACL ready for industry prime time? The answer is complicated but unfortunately I have to say, not yet. First of all, even though SHACL is the W3C recommendation, there is another proposal called ShEx from Jose Emilio’s group. He acknowledges his bias but if you look at the ShEx and SHACL side by side, you can argue for one or the other objectively. For example, ShEx supports recursive constraints, but SHACL doesn’t (There was a research paper on this topic, Semantics and Validation of Recursive SHACL, … but it’s research!). Nevertheless, the current SHACL specification is stable and technically ready to be used in prime time. The problem is the lack of commercial tools for enterprise data. Jose Emilio is keeping a list of SHACL/ShEx implementations but all except for TopQuadrant, are (academic) prototypes. Seems like Stardog is planning to officially support it in their 6.0 release. At this stage, I was expecting to see a standalone SHACL validator that can take as input RDF data or a SPARQL endpoint and run the validations. With all due respect, but these kind of situations are embarrassing for this community and industry: apparently a standard is needed, a recommendation is made, but at the end there is no industry implementation and uptake (one or two is not enough). We live in a “build it and they will come” world and this does not make us look good. </rant>. On a positive note, I think we are very close to the following: create a SHACL to SPARQL translator that starts out by supporting a simple profile of SHACL (cardinality constraints). This way anybody can use this on any RDF graph database. Somebody should build this, and we should support it as a community, not just academics but also having industry users behind it.

Hat tip to Jose Emilio for the nice SHACL/ShEx Playground and to EricIovka and Dimitris for making their book, Validating RDF, available for free (html version).


SOLID: I missed out on the Decentralizing the Semantic Web workshop. I heard it was packed and I guess it did help that Tim Berners-Lee was there presenting on Solid. Later on, I had the chance to talk to TimBL about Solid and his new startup Inrupt. The way I understood Solid and what Inrupt is doing is through the following analogy: They have designed a brand new phone and the app store infrastructure around it (i.e. Solid). However, people already have phones (web apps that store your data) so they need to convince others to use their phone. Who would they convince and how? Ideally, they want to convince everybody on earth… literally, but they can start out with people who are concerned about data ownership privacy. My skepticism is that the majority of the people in the world don’t care about it. Jennifer Golbeck’s keynote touched on this topic and stated that young people don’t care about privacy but the older you get, the more you start caring. Solid is definitely solving a problem but I question the size of the market (i.e. who cares about this problem). Good luck Inrupt team!

Enterprise Knowledge Graphs: One of the highlights of ISWC was the Enterprise Knowledge Graph panel. This was actually a great panel (commonly I find that panels are very boring). The participants were from Microsoft, Facebook, Ebay, Google and IBM. I had two main takeaways.
1) For all of these large companies, the biggest challenge is identify resolution. Decades of Record Linkage/Entity Resolution/etc research and we are still far away from solving this problem… at scale. Context is the main issue.
2) The most important takeaway from the entire conference was: NONE OF THESE COMPANIES USE RDF/OWL/SPARQL… AND IT DOESN’T MATTER! I was actually very happy to hear them say this in front of the entire semantic web academic community. At the end, the ideas of linking data, using triples, having tight/loose schemas, reasoning, all at scale have come out of the semantic web research community and started to permeate into the industry. It’s fine if they are not using the exact W3C Semantic Web Standards. The important thing is that the ideas are being passed on to the real world. It’s time to listen to the real world and see what problems they have and bring it back for research. This is part of the scientific method!
Notes from each panelist:

Another possible answer to Yolanda Gil’s question is the recently launched dataCommons.org.
The final question to the panel: what are the challenges that the scientific community should be working on. Their answers:


Not everybody is a Google: The challenges stated by the Enterprise Knowledge Graph panelist are for the Googles of the world. Not everybody is a Google. For a while now, I feel that a large research focus is on tackling problems for the Googles of the world. But what about the other spectrum? My company Capsenta is building knowledge graphs for very large companies and I can tell you that building a beautiful, clean knowledge graph from even a single structured data source, let alone a dozen, is not easy. I believe that the semantic web, and even the database community have forgotten about this problem and dismissed this as day to day engineering challenges. The talk “Integrating Semantic Web in the Real World: A Journey between Two Cities” that I have been giving this year details all the open engineering, scientific and social challenges we are encountering. One of those problems is defining mappings from source to target schemas. Even though the Ontology Matching workshop and the Ontology Alignment Evaluation Initiative have been going on for over a decade… the research results and systems do not address the real world problems that we see at Capsenta in our day to day. We need to research the real world social-technical phenomenons of data integration. One example is dealing with complex mappings. I was very excited to see the work of Wright State University and their best resource paper nominated work “A Complex Alignment Benchmark: GeoLink Dataset”. This is off to a good start but there is still a lot of work to be done. Definitely a couple of PhDs can come out of this.


Natasha Noy’s keynote:  I really enjoyed her keynote, which I summarized: 

She also provided some insight on Google Dataset search:


Vanessa Evers’ keynote was incredible refreshing, because it successfully brought to the attention of the semantic web community the problems encounter to create social intelligent robots. Guess what’s missing? Semantics and reasoning!


Industry:  I was happily surprised to see a lot of industry folks this year. The session I chaired had about 100 people.

Throughout the week I saw and met with startups like Diffbot and Kobai; folks from Finance: FINRA, Moodys, Federal Reserve, Intuit, Bloomberg, Thomson Reuters/Refinitiv, Credit Suisse; Graph Databases companies: Amazon Neptune, Allegrograph, Marklogic, Ontotext’s GraphDB, Stardog; Healthcare: Montefiore Health Systems, Babylon Health, Numedii; the big companies: Google, Microsoft, IBM, Facebook, Ebay; and many others such as Pinterest, Springer, Elsevier, Expert Systems, Electronic Arts. Great to see so much industry attending ISWC! All the Industry papers are available online.

Best Papers: The best papers highlighted the theme of the conference: Knowledge Graphs and Real World relevance. The best paper went to an approach to provide explanations of facts in a Knowledge Graph.

The best student research paper was a theoretical paper on canonicalisation of monotone SPARQL queries, which has a clear real world usage: improve caching for SPARQL endpoints.

The best resource paper address the problem of creating a gold standard data set for data linking, a crucial task to create Knowledge Graphs at scale. They present an open source software framework to build Games with a Purpose in order to help create a gold standard of data by motivating users through fun incentives.

The best in use paper went to the paper that describes the usage of semantic technology underpinning Wikidata, the Wikipedia Knowledge Graph.

Finally, the best poster went to VoCaLS: Describing Streams on the Web and the Best demo award went to WebVOWL Editor.


DL: Seems like DL this year meant Deep Learning and not Description Logic. I don’t think there was any paper on Description Logic, a big switch from past years.


Students and Mentoring:  I enjoyed hanging out with PhD students and offering advice at the career panel during the Doctoral Consortium and at the mentoring lunch.

During the lunch on Wednesday we talked about science being a social process and it was very nice that this also came up on Thursday during Natasha’s keynote


Striving for Gender Equality: I am extremely proud of the semantic web research community because they are an example of always striving for gender equality. This year they had a powerful statement: conference was organized entirely by women (plus Denny and Rafael) and  they had 3 amazing women keynotes. Additionally, the local organizers did a tremendous job!

Furthermore, Ada Lovelace Day, which is held every year on the second Tuesday of October, occurred during ISWC. So what did the organizers do? They held the Ada Lovelace celebration where we had a fantastic panel discussing efforts on striving for gender equality in the sciences (check out sciencestories.io!)

The event ended with Wikipedia Edit-a-thon where we created and edited Wikipedia pages of female scientist. In particular, we created Wikipedia pages for female scientist in our community: Natasha Noy, Yolanda Gil, Lora Aroyo. It was a true honor to have the opportunity to create the english wikipedia page of Asunción Gómez Pérez, who has been incredibly influential in my life.

More trip reportsCheck out Helena Deus’ and Paul Groth’s ISWC Trip reports (which I haven’t read so it wouldn’t bias mine)

What an awesome research community: I am very lucky to consider the Semantic Web community my research home. It’s a work hard, play hard community.

We were at a very beautiful venue:

We like to sing

We like to have great dinners and dance:

We even throw jam sessions and parties:

And just like last year, I recorded the Jam session:

ISWC Jam Session

Posted by Juan Sequeda on Thursday, October 11, 2018

Posted by Juan Sequeda on Thursday, October 11, 2018

Posted by Juan Sequeda on Thursday, October 11, 2018

See you next year in New Zealand

… and then in 2020 … Athens, Greece!

ISWC 2017 Trip Report

The 16th International Semantic Web Conference took place from October 21-25 in Vienna, Austria. These are my random thoughts.

First of all, I’m honored to be part of the Organizing Committee as a chair of the In-Use Track, together with Philippe Cudré-Mauroux. Jeff Heflin was the General Chair and a fantastic leader. The conference was impeccable thanks to the AMAZING local organization. Axel Polleres and Elmar Kiesling did an incredible job. I truly enjoyed every step of the process to help organize ISWC 2017. I am really looking forward to ISWC 2018 in Monterey, CA and ISWC 2019 in Auckland, New Zealand!

I was part of a pre-ISWC meeting which a followup with the group that attended the Dagstuhl workshop on Federated Semantic Data Management. We continued defining a list of prioritized research topics.

Frank van Harmelen gave a fantastic keynote at the Semantic Science workshop about the end of the scientific paper.

Btw, if you have the chance to see Frank give a talk… it’s a must! He is one of the best speakers in academia that I have ever seen. I wish I could present like him!

I attended most of VOILA!2017 Workshop. The highlight of the event was the demos. Around 20.

* Next version of VOWL is addressing a lot of needs.
* Check out ViziQuer. It looks cool but I’m skeptical about how usable it is.
* Great to see interest on UIs for generating R2RML mappings but they haven’t been tested yet with real world scenarios. Long ways to go here.
* I need to check out the Linked Data Reactor
* Treevis.net, interesting resource. Need to check it out.
* The extensible Semantic Web Browser
* user, user, user: everybody mentions users but usually the “user” is not defined. Who exactly is your user?

Welcome ceremony was in the Vienna Rathaus. Beautiful place. We were so lucky.

A post shared by Juan Sequeda (@j_sequeda) on

A post shared by Juan Sequeda (@j_sequeda) on

During the welcome ceremony, we had reencounter of 5 alumni from the 2008 Summer School on Ontological Engineering and Semantic Web: Laura Drăgan, Tara Raafat, Maria Maleshkova, Anna Lisa Gentile and myself with John Domingue and Enrico Motta who were the organizers. We have gone a long ways!

Great discussion about the history of Project halo funded by Vulcan with Michael Witbrock, Oscar Corcho and Steffen Staab. Learned a lot of historic details.

Congrats to Mayank Kejriwal for winning the 2017 SWSA Distinguished Dissertation Award! Mayank and I are academic brothers: we both did our PhD at the University of Texas at Austin under the supervision of Prof Daniel Miranker.

Congrats to DBpedia for winning the SWSA Ten-Year Award. Definitely well deserved!

Industry and In use: If I’m not wrong, approximately 25% of attendees of ISWC were from industry and government (more specifically not from academia). All the industry talk were on Monday. Great to see the room full all of the time. We are definitely seeing more use of semantic technologies. However, my observation is that this is mainly government and research/innovation folks are very large companies. It is not yet replacing the status quo. Additionally, a lot of complaints about the lack of maturity of tools, specially open source tools. I’m not surprised.

Ontology engineering seems to be popular (or it never stopped?). Deborah McGuinness‘ keynote showed real world projects in health care where ontologies play a central and vital role. Takeaway message: it takes a village.

It seems to me that we have had the following evolution in the past decade: first focus on hard core theoretical ontologies (DL and the like), second focus has been more on the data side (Linked Data), third focus (now) is about “little semantics goes a long way”. Jim Hendler has always been right (see my comments below on Jamie Taylor’s keynote).

Is this the year of the Knowledge graph? Are Knowledge Graphs becoming Mainstream? Thomson Reuters announced (by coincidence?) their Knowledge Graph while ISWC was going on. There was no formal announcement during the conference.

Interesting part is that Thomson Reuters built their own RDF graph database (triplestore). Why? See this tweet:

I presented a poster on the Ontology and Mapping Engineering Methodology that we have been using at Capsenta in order to apply semantic web technologies to address data integration and business intelligence pain points. THANK YOU THANK YOU THANK YOU for all the feedback that I received during the poster session and hallway conversations. This is the reason why you go to a conference! Special shoutout to Enrico Franconi and Frank van Harmelen. Conversations with you were extremely insightful.

Jamie Taylor, the former Minister of Information at Freebase (and the person who has had one of the coolest titles) and who now manages the Schema Team for Google’s Knowledge Graph gave the third keynote, which btw, was exactly what you expect for a keynote. Thanks Jamie for such an awesome talk!

His message was very clear: we need actionable and executable Ontologies/Knowledge Graphs. What does this actually mean? The example he gave was the following: in the Google KG, they have assertions that Polio and the Vaccine for Polio, but no where it is asserted that the Vaccine for Polio prevents Polio. This goes into adding common sense knowledge (think about Cyc).  I think it would be fair to say that the lessons learned reported by Jamie were a bit “duh”/“told you so” to this community. My observation is that the giant Google, at the end, is doing what the Semantic Web community has been working on for over a decade. This is good! It was very nice to see the evolution of the Knowledge Graph at Google and insightful to see the practical role that semantics take place. Pascal Hitzler quickly wrote up his take away from Jamie’s keynote.

Congrats to Olaf Hartig, Ian Letter and Jorge Perez for winning the Best Paper Award.

This paper presents foundational work towards understanding what are Linked Data Fragments (LDF) and the relationship between different types of LDF. From a more general point of view, this work helps to formalize the relationship between a Client-Server architecture. Hence it’s applicability is not just within the Semantic Web. This is a beautiful example of how theory and practice can be bridged. Additionally, the presentation was simply brilliant. Jorge Perez has the capability of taking the most complicated concepts and presenting them in a way which is understandable and enjoyable to the audience. I can’t wait to see this presentation on video lectures. When it is published, this is a must see on how to present a paper at a conference. I wish I could present like Jorge!

Daniel Garijo presented his WIDOCO tool. If you work with ontologies, you really need to use this tool which basically is an outsource for the documentation of the ontology. He also received the Best Paper Award for the Resource track. Well deserved!

You can find all the papers of the conference for download on the ISWC 2017 website. No paywall!

The Job Fair was a great addition. Looking forward to seeing its evolution in the upcoming ISWC.

I really enjoyed being part of the mentoring session. It’s great to hear students about what worries them and provide some advice. We discussed paper publishing process, academia vs industry, US vs Europe, dealing with loneliness, and many more topics. Please reach out if you have any questions!

Great to have more Austin presence at ISWC with data.world

who also sponsored the … JAM SESSION! All I can say is:

and without further ado, here is 1+ hour video of the Jam Session, a large group of Semantic Web Computer Scientist PhDs jamming, after just 3 hours of practice. I think this is the definition of epic! Enjoy!

 

See you next October in Monterey, California for ISWC 2018!

Why doesn’t the Database and Semantic Web research community interact more?

I was in Chicago to meet with colleagues from the Graph Query Language task force at the Linked Data Benchmark Council (LDBC) so we could have an impromptu face-to-face meeting (great progress towards our graph query language proposal!). They were in Chicago attending one of the main academic database conferences: SIGMOD/PODS. I was able to take a quick look at papers, demos and tutorials.

I left with the following question: Why doesn’t the Database and Semantic Web research community interact more? The cross pollination, in my opinion, is minimal. It should be much bigger. A couple of examples:

If you go to two conferences of your field in a year, consider swapping one conference to attend another conference in a different field. For example, for the Semantic Web community, if you attend ISWC and ESWC, consider swapping one of those to attend SIGMOD or VLDB. Same for the database community.  VLDB 2017 will be in Munich from August 28th to September 1, 2017.

I made a list of papers from SIGMOD/PODS (research papers, demos and tutorials) that I believe are relevant to the Semantic Web community. The SIGMOD and PODS papers are available online

PODS Papers

SIGMOD Papers

SIGMOD Demos

SIGMOD Tutorials

P.S. For the travel and points geeks. Last minute travel to Chicago was really expensive. Over $500 USD.  I was able to use 25000 miles and pay just $10 USD. And I even got upgraded to first class!

Smart Data and Graphorum Conference Trip Report

I attended the Smart Data-Graphorum Conference (January 30 – February 1) in the Bay Area (actually Redwood City). This conference series originally was called Semantic Technology (SemTech) Conference and I have been presenting at it since 2010.

This year, the conference had a cozy feeling with ~250 attendees. I gave two talks:

  • Graph Query Languages: Similar to my Graph Data Texas talk, I gave an update from the Graph Query Language task force at the LDBC. The latest discussions were incorporated in this talk. We have been discussing the idea of having a paths as a datatype and also its own table ( a table for Nodes, Edges and Paths). Additionally, there are two notions of projection: relational vs graph. The slides provide some examples. This is still on going work.

  • Virtualizing Relational Databases as Graphs: a multi-model approach: In this talk I discussed how relational databases can be virtualized as RDF Graphs by using the W3C RDB2RDF standards: Direct Mapping and R2RML. I argue that graphs are cool, and ask if relational databases are cool? If you are  deciding to move from a relational database to a graph database, you should understand the tipping point. I believe virtualization is a viable option to keep your data in a relational database while continuing to take advantage of graph features. However, that may not always be the case.

 

Additional highlights of the conference

  • I was glad to see a lot of friendly faces. I feel very lucky to that I can always have a chat with Deborah McGuinness and Michael Uschold, two legends in ontologies. It’s always great to see Souri Das from Oracle (and all the Oracle folks from the semantic technology group) and discuss how the W3C RDB2RDF standards are doing. We both agree that we did a good job with that standard and gave a pat on our own backs 🙂 Also great to see Peter Haase, Dean Allemang, Atanas Kiryakov, Bart van Leeuwen, Jans Aasman, Dave McComb and many more.
  • Michael Uschold and I discussed the pragmatics of part-of and has-label semantics. For some situations you want to be generic. For example, it’s easier for a user to just use “has label” for any thing, instead of having to know the exact type of “has label” for a specific thing. Now I understand many of the modeling decisions made in gist. I argue that from a database point of view, query performance is better if you have more specific properties, unless you have some sort of semantic query optimizations.
  • Cambridge Semantics gave a presentation on their in-memory analytics graph database. They presented results using the LUBM benchmark where they claim to have blown Oracle away. Important to note that they used 4x the hardware. Atanas Kiryakov, Ontotext’s CEO was in the audience and rightfully asked why they didn’t use a more up to date benchmark given that LUBM is from 2007. It seems that everybody has been using LUBM (since 2007) so in order to compare to others, they continue to use LUBM. Hopefully they will start using the LDBC benchmarks!
  • I have been aware that Marklogic markets themselves as a document and graph database. I now understand how they represent things underneath the hood. Each entity, with their corresponding attributes and values are represented in a document (key-values). The relationships between the entities are represented as RDF triples.  This makes a lot of sense to me and I can imagine how this can improve query performance to a certain degree.
  • Brian Sletten gave a great talk on JSON-LD. I wish all web developers could see this presentation in order to understand the value of Linked Data. Even though Brian was not able to give his talk on the new W3C upcoming standard SHACL, the Shapes Constraint Language, his slides left a lasting impression. This is the best definition I have ever seen for the Open World Assumption!

  • It was great to see Emil Eifren, Neo Technologies’ CEO again. We discussed history of RDF and Semantic Web (I didn’t know he was a very early user of Jena!). We seem to be in agreement that RDF is great technology for data integration. Anything else graph related, he argues that you should use Neo4J. Not surprising 😛 I was also glad to see that Neo4j is starting to work on formalizing the semantics of Cypher, including making it a closed query language.

This was a great couple of days and hopefully next year we will have more people!

A Data Weekend in Austin

On the weekend of January 14-15, I attended Data Day Texas, Graph Day Texas and Data Day Health in Austin and gave three talks.

Do I need a Graph Database: This talk came out of a Q/A during a happy hour after a talk I gave at a meetup in Seattle. We were discussing when to use a Graph Database? What type of graphs should you use: RDF or Property Graph.

 

Graph Query Languages: This talk gave an update on the work we have been doing in the Graph Query Language (GQL) task force at the Linked Data Benchmark Council (LDBC). The purpose of the GQL task force is to study query languages specifically for the Property Graph data model because there is a need for a standard syntax and semantics of a query language. One of the main points I was arguing in this talk is the need of a closed language: graphs in, graphs out. One can argue that a reason for success of relational databases is because the query language is closed (tables in, tables out). With this principle, queries can be composed (i.e. views!). This talk was well received and generated a lot of interesting discussion, specially when Emil Eifrem, Neo Technologies’ CEO is in the room.  An interesting part of the discussion was if we are too early for standardization. Emil stated that we need standardization now because their clients are asking for it. I stated that graph databases today are in the mid 1980’s of relational databases, so time is about right to start the discussion. Andrew Donoho said I was too optimistic. He thinks we are in the late 70s and we are too early. I will be giving this talk next week at the Smart DataGraphorum conference, with some updated material. Special thanks to Marcelo Arenas, Renzo Angles and specially Hannes Voigt for helping me organize these slides.

Semantic Search Applied to Healthcare: In this talk, I introduced how we are identifying patients who are in need of Left Ventricular Assist Devices (LVADs) using Ultrawrap, the semantic data virtualization technology developed at Capsenta. This talk presented a use case with the Ohio State University Wexner Medical Center. Patients are being missed through traditional chart pull methods. Our approach has resulted in ~20% increase in detection over previously known population at OSU, which is a mature institution. This talk will also be given at the Smart Data conference.

Main highlights of the conference:

  • Emil Eifrem, CEO of Neo Technology gave the keynote. It was nice to learn the use cases where Neo4j is being used: Real-time recommendation, Fraud detection, Network and IT operations, Master Data Management, Graph-Based Search and Identity & Access Management. It was not clear why were graphs specifically used because these are use cases that have been around for a long time and have been addressed using traditional technologies. Emil ended talking about a “connected enterprise”, meaning integrating data across silos using graphs. If you take a look at my Do I need a graph database talk,  you will see that I argue to use RDF for data integration, not Property Graphs.
  • Luca Garulli, the founder and CEO of OrientDB gave a talk focusing on the need of a multi model database like OrientDB. In his talk, he argued for many features which Neo4J apparently didn’t support. Not long after, there was a good back-and-forth twitter discussion between Emil and Luca. Emil was correcting Luca. Seems like this talk may need to be updated. An interesting take away for me: how do you benchmark a multi model database?
  • Many talks about “I’m in relational, how do I get to property graphs”. All of them at an introductory level. Given that we have studied very well the problem of relational to RDF, this should be a problem that can be address quickly and efficiently.
  • Standards was a big topic, one of the reasons my Graph Query Language talk was well received. Neo4j is pushing for OpenCypher to become the standard, while in fact, one could argue that Gremlin is already the defacto standard. Before this weekend, I wasn’t aware of anybody implementing OpenCypher. Apparently there are now 10 OpenCypher implementation including Bitnine, Oracle and SAP HANA.
  • Bitnine: they are implementing a PropertyGraph DB on top of Postgres and using OpenCypher as the query language. They are NOT translating OpenCypher to SQL. Instead, they are doing the translation to relational algebra internally. I enjoyed the brief discussion with Kisung Kim, Bitnine’s CTO. Apparently they have already benchmarked with LDBC and did very well. Looking forward to seeing public results. Bitnine is open source.
  • Take a look at sql2gremlin.com
  • grakn.ai looks interesting. Need to take a closer look.
  • Cray extended the LUBM benchmark and added a social network for the students.
  • Property Graphs is what comes to mind when people thing about graph databases. However, an interesting observation is that the senior folks in the room prefer RDF than Property Graphs. We all agreed that RDF is more mature than Property Graph databases.
  • “Those who do not learn history are doomed to repeat it.” It is crucial to understand what has been done in the past in order to not re-invent the wheel. I feel lucky that early on in grad school, my advisor pushed me to read pre-pdf papers. It was great to meet this weekend with folks like Darrel Woelk and Misty Nodine who used to be part of MCC. A lot of the technologies we are seeing today has roots back to MCC. For example, we discussed how similar graph databases are to object oriented databases. On twitter, Emil seemed to disagree with me. Nevertheless we had an interesting twitter discussion.
  • Check out JanusGraph, a graph database, which if I understood correctly, is  a fork from Titan. Titan hasn’t been updated in over a year because the folks behind it are now at DataStax.

Thanks to Lynn Bender and co. for organizing such an awesome event! Can’t wait for it to happen in Austin next year. Recordings of the talks will start to show up on the Global Data Geek youtube channel.