Bringing Graph Databases and Network Visualization Together Dagstuhl Seminar Trip Report

I had the honor to organize my first Dagstuhl Seminar on Bringing Graph Databases and Network Visualization Together. I’ve been lucky to have the opportunity to attend multiple times, but this was very important because I was an organizer. 

Let me start out by saying that I’ve always been skeptical about the value of graph and network visualization. That skepticism is what motivated me to organize a seminar to bring these two communities together. I was very lucky to have met Hsiang-Yun Wu and Da Yan at a previous Dagstuhl Seminar on Graph Databases and then be introduced to Karsten Klein in order to organize this seminar. 

My main takeaways:

– Not surprising, there is a sizable gap between the Graph Database and Network Visualization communities.
– There are low hanging fruits to bridge the gap, both from an academic and practical point of view
– Biggest open problem: cool vs usefulness 

Network Visualization layouts focus on the graph structure

Graph layouts focus on the structure of the graph. The layouts are evaluated based on aesthetics such as minimizing cross edges, length of edges, etc. The AHA moment I had is when I learned that layout algorithms consider the input graph to be undirected and unlabeled. Thus, graph layouts focus on the structure (i.e. syntax) of the graph and not the meaning (i.e. semantics). This may explain why many knowledge graphs look unusable when they are visualized with default layouts such as forced directed layouts because knowledge graphs have … knowledge, represented as directions on the edges and labels on the nodes! This realization was a huge AHA moment for everyone. 

Visualizing “large” graphs

Fabrizio Montecchiani gave an overview of visualizing large graphs. Network visualization research focuses on scalable layout algorithms for “large” graphs. I put large in quotes on purpose because large is hard to define. 1 million nodes? 10 million nodes? For me those are small graphs. For others, they are large. Current research on scalable graph layout focuses on implementations based on GPUs, Parallel and distributed algorithms, and big data frameworks. Some examples: 

– Forced directed layout on an Intel Xeon CPU X5650 @ 2.67 GHz and nVidia GF100 [Quadro 5000] 10M vertices and 20M edges took about 36 seconds per iteration (but the number of iterations might be very large!) See Yunis et al. Scalable Force Directed Graph Layout Algorithms Using Fast Multipole Methods. ISPDC 2012
– Multilevel algorithm based on maxent-stress optimization. Parallel implementation based on OpenMP (shared memory). Workstation: Octa-Core Intel Xeon E5-4640 processors (32 cores, 64 threads) @ 2.4 GHz. 1M vertices and 3M edges took about 27 seconds. See Meyerhenke et al. Drawing Large Graphs by Multilevel Maxent-Stress Optimization. IEEE Trans. Vis. Comput. Graph. (2018)
– Implementation based on Apache Giraph https://multigila.graphdrawing.cloud/

These are surveys to consider:
– Hu and Shi. Visualizing large graphs. WIREs Comput Stat, 7: 115-136 (2015)
– von Lazndesberger et al. Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges. Comput. Graph. Forum 30(6): 1719-1749 (2011)

Another AHA moment we had is realizing that the network visualization community expects graphs to mainly be representing (large) data instead of (small) metadata or schemas. Different sizes, different use cases, different users, different problems. 

Who are the users?

Tatiana von Landesberger gave a presentation on Qualitative Evaluation. My personal realization is that we, the graph database community, are missing out on the possibility of learning so much more because we do not do qualitative evaluations. We focus on building systems and on quantitative evaluations while ignoring the end users. The opportunity is that we are exposed to many types of users. The big AHA moment is that the network visualization considers that there is only one user: the subject matter expert who is the final user making a decision about an analysis (i.e Doctor, Journalist, Scientist). From a database perspective, there are many types of users: data engineer, data analyst, database developer, ontologist, taxonomist, knowledge scientist, data stewards, analytics engineer, and of course the final “business user”. Explaining the entire flow of data of how it gets integrated eventually into a (graph) database and how it involves a variety of personas was an AHA moment for network visualization folks.   

So many layouts!

It was great that the CTO of Yworks, Sebastian Müller, attended the seminar. Yworks is a spin-off of the University of Tuebingen founded in 2001. They attend most of the graph drawing and network visualization conferences, learn about the latest graph layouts and implement them. This is why they have a library of 100s of layouts. I have to say, this is very overwhelming. How do I know which layout to use? I sat down with Sebastian and told him about a specific problem, task, user and he immediately suggested a couple of layouts to consider. Thank you Sebastian… but this isn’t scalable!

Cool vs Useful

A common phrase is “wow, this is cool.. but how useful is this?” Seems like the community is just starting to scratch the surface on this topic. Stephen Kobourov shared the thought that we should be looking at this as phases of usefulness. A graph visualization that has a cool WOW factor should drive engagement, which should then lead to serendipitous discovery and ultimately lead to accomplishing a specific task. For example, a graph visualization can provide context which drives trust. Catia Pesquita described how a subject matter expert confirming the results of an ontology matching system could benefit by visualizing parts of the ontology in order to provide context. Visualizing that two concepts match is not useful by itself. However, visualizing the surrounding concepts can provide important context to confirm if that match is trustworthy or not. 

Visualization and Querying

Walter Didimo, Beppe Liotta and Fabrizio Montecchiani presented work on Visual Graph Query and Analysis for Tax Evasion Discovery in conjunction with the Italian Revenue Agency. I really like how this research group is working on how to apply network visualization research. Other aspects that were discussed were on visualizing the results, specifically how/why provenance of a result and also paths that may be returned in a result.

Opportunities

The big open problem is defining what is useful. This is definitely a socio-technical phenomena and there is a lot of work to be done. 

A way to start tackling usefulness is by refocusing on the different types of users and the variety of tasks they may have. For example, we started to outline all the roles and the possible interactions between each of those roles. Each interaction has a specific task which may or may not be supported by a visualization. This is how we were trying to break down the problem to smaller pieces. 

By understanding the different roles, interactions and tasks, these could then be associated with existing graph layouts and metaphors. It would be great to have this taxonomy of graph layouts, connected to use cases, roles, tasks, etc. 

Final Words

I am extremely lucky to have the opportunity  to spend time with some of the brightest minds in Graph Databases and Network Visualizations. There were 7 people from Graph Databases and 15 people from Network Visualizations, all of us in person except for Da Yan. On behalf of the organizers, thank you to each and everyone of you for coming in person for this seminar during these hard times: Michael Behrisch, Walter Didimo, Nadezhda Doncheva, Henry Ehlers, George Fletcher, Carsten Görg, Katja Hose, Pavel Klinov, Stephen Kobourov, Oliver Kohlbacher, Beppe Liotta, Fabrizio Montecchiani, Sebastian Müller, Catia Pesquita, Falk Schreiber, Hannes Voigt, https://visva.cs.uni-koeln.de/, Markus Wallinger 

This is a strong reminder that science is a social process and in-person meetings are extremely valuable. 

It truly is amazing how a diverse group of people can get together and very quickly start working together. At one point, all of us were collaboratively working on a Google doc defining the outline of a vision paper we plan to publish together. We also had a lot of bonding time and even during one evening, we each gave short talks about random, non-work topics while enjoying beer and wine: Airplane accidents, Music is NOT a universal language, The other pandemic: Bananas, Dead Language in Japan, History of money and debt, Oldest bank in the world, Portugal Nuns and Eggs, Everesting and Underwater rugby.

Dagstuhl also was phenomenal on how they managed Covid. We all had to present negative covid tests before arriving and we got tested on Monday, Wednesday and Friday. Luckily nobody tested positive.

We have several next steps

– Start a slack community in order to create a community between graph databases and network visualization (To Be Announced soon!)
– Write a vision paper, similar to the CACM The Future is Big Graphs paper
– Organize workshops and panels in database and network visualization conferences

Our goal is to foster new research opportunities and we plan to meet again in Dagstuhl in a couple of years, with a larger group, to review the progress that will hopefully be made.