Thank you Capsenta! Hello data.world!

Together with my PhD advisor, Prof Daniel Miranker at the University of Texas at Austin, we founded Capsenta in 2014 for the following reasons:

1) We truly believe that there is a commercial opportunity for companies wanting to connect their relational databases with semantic web technologies.
2) We are both passionate about commercializing research via startups.

Over the years we have been using Ultrawrap, and recently Gra.fo, to address data integration and business intelligence problems using semantic web technologies. Business users do not understand their complex data sources. IT struggles to understand the thousands of tables, millions of attributes and how the data all works together. We deliver a beautiful view of these myriad, complex relational data sources by designing an ontology (i.e., knowledge graph schema), mapping it to the complex data sources via a pay-as-you-go methodology and then using the mappings to integrate the data in a virtual (NoETL) or materialized (ETL) way. Our ultimate goal is to take complex data and turn it into beautiful data.

We are now experiencing more interest and uptake from the industry. Knowledge Graphs are the new cool kid on the block. Graph databases are hot. The Semantic Web community continues to constantly provide evidence that semantic technology works in the real world (just see all the papers in the in-use and industry tracks at ISWC and ESWC, the recent Knowledge Graph Conference, etc.). The industry is really starting to care!

There is one company who really, really cares: data.world.

And I am extremely excited to share that

Capsenta has been acquired by data.world!

Who is data.world?

data.world is a data platform where anybody can add their data, integrate it, share it, query it and much more. Data on the data.world platform becomes part of a web of linked data. The coolest thing is that it runs 100% on semantic web technology. Literally! They have made use of many research results from the semantic web community. For example, every single dataset is stored as RDF HDT. They make use of Apache Jena. You can query all the data in SPARQL, even federate queries across different datasets. When you import relational or csv data, they use the RDB2RDF and CSV2RDF direct mappings. They have even created their own SQL to SPARQL translator, thus enabling tabular data to be queried in SQL in addition to SPARQL. All changes are tracked and the provenance is represented in PROV-O and queryable. Heck, they even support SHACL! data.world is a true semantic web platform.

data.world started out in 2016 by creating a community of open data, which has been called a kind of “GitHub for data”. Now, data.world is the world’s largest collaborative data community and that community has come together to upload and curate hundreds of thousands of data sets.

data.world is also a Public Benefit Corporation with the following ambitious mission:

– Build the most meaningful, collaborative and abundant data resource in the world in order to maximize data’s societal problem-solving utility.
– Advocate publicly for improving the adoption, usability, and proliferation of open data and linked data. (YES, you read that correctly! Their mission is to improve the adoption of linked data!!!!)
– Serve as an accessible historical repository of the world’s data.

It’s now time to start the next phase of taking data.world to the enterprise. This is where Capsenta comes in.

Why am I excited?

There are two main reasons why I am excited:

Perfect Technology Match: We both breathe and eat semantic web. Ultrawrap is a component that will help data.world create a hybrid data platform. We have enterprise customers who want to keep their data in place and not move it to the cloud. This is where Ultrawrap NoETL plays a crucial role. Furthermore, we both acknowledge that we need to make semantic web technology easy to use. data.world’s consumer-grade UI is a valuable differentiator. At Capsenta we created Gra.fo because there wasn’t an easily-usable ontology/knowledge graph schema editor for business users.

Perfect Mission/Vision Match: We are both heading towards the same goal. The way data is managed within enterprises is ugly and complicated. We have to address this problem from a holistic point of view. At Capsenta, our goal is to change the way the world models, governs and integrates data by generating beautiful data that the business users can consume and start solving their business problems. We want to democratize data, or how data.world states it, humanize the data. It’s clear to us that data integration is not just about the technology but also about the people. We need to empower the different stakeholders to be part of the conversation. That is why data.world is all about collaboration. Capsenta’s Gra.fo allows users to share their documents and have conversations via comments.

Oh, and we are both in Austin! How cool is that!

How did we get here?

When I transferred to UT Austin to finish my undergrad in Computer Science in 2006, by serendipity, I met Prof. Daniel Miranker. He was also intrigued by the Semantic Web. Our research started with a very basic question: what is the relationship between relational databases and semantic web? It was clear to us that if the semantic web were to be successful, it must incorporate relational databases because that is where the majority of data is located. After I finished my undergrad, I wanted to continue this same line of research and keep working with Dan. One of the main reasons I wanted to do a PhD was because of the potential to start a company from our research. If semantic web technologies were to take off, then we would be seeing a lot of companies wanting to integrate their relational databases with the semantic web… and we would have the solution! Capsenta was founded to commercialize my PhD research.

With his fantastic technical basis in hand, Wayne Heideman joined the journey as CEO to guide the commercialization of these ideas, the technology and its productization. Since then we have demonstrated that our technology works to integrate data within very large enterprises in industries such as healthcare, e-commerce, oil and gas, and pharma and have enjoyed commercial success with millions of dollars of customer revenue.

Personally, I have learned A LOT about how to work with data in large enterprise settings (our smallest customer is a billion dollar revenue company), from both the technical and social aspects. It is very satisfying to see our research being used in the real world to solve challenging data integration problems… and that we get paid to do it.

In order to scale Capsenta’s business, we needed more fuel. Given the alignment that we have with data.world, it makes complete sense to join forces.

What’s next?

The entire Capsenta team has joined data.world! I am now Principal Scientist at data.world. I continue to wear my scientific hat and collaborate with many research partners, attending and presenting at conferences, participating in program committees and editorial boards, supervising students and more. I also wear a business hat where I support engineering, technical sales and work with customers to understand their problems and tie them back to R&D.

Capsenta and data.world had already been working together for over a year as partners and Ultrawrap NoETL was already integrated as the virtualization mechanism for data.world before the acquisition. It will be very fun to further integrate Capsenta’s technology within data.world. We also plan to continue to support all of our customers and continue development and support for Ultrawrap and Gra.fo.

Parting Thoughts

With my scientific hat

I am very proud to be part of a startup coming out of research done at the Department of Computer Science at the University of Texas at Austin. I’m looking forward to seeing more startups coming out of UTCS.

There is so much fun research to be done! It’s going to be fun organizing our research plans for the short, medium and long term. Stay tuned!

With my business hat

This is a huge win for companies who are looking to deploy an Enterprise Knowledge Graph. If you are learning and starting small, we can help you. If you are advanced and you know exactly what you want, we can help you too. Together, we now have the best platform in the world to create knowledge graphs!

Personally,

Thanks to the entire Capsenta team, past and present. We are starting this new chapter thanks to all of you.

Thanks to the investors for trusting us in this endeavor.

Thanks to Dan Miranker for believing in me.

Thanks to Wayne Heidenman for teaching me so much about business and technology.

Thanks to my family for supporting me every step of the way.

It’s clear that both Capsenta and data.world are heading in the same direction. We are honored and humbled to be invited to be part of the data.world journey and are excited at what it holds for us all.

Thank you Capsenta!

Hello data.world!

We are one team now.

Gra.fo, six months later! What have we been up to?

I can’t believe that it has already been six months since we first announced Gra.fo. Time flies when you are having fun! I am really excited to share with everybody some of the major features we have been up to: New Exports, Graph Schema Documentation, Multi-select and Import Mapping.

New Exports

It was clear to us from the beginning that Gra.fo was in the position to support both RDF Graph and Property Graph communities. We started out by exporting the schemas as OWL ontologies in Turtle and RDF/XML syntaxes. However, we lacked support for Property Graph schemas.

Throughout the past few months, we have been thrilled to see the interest of the Property Graph community in schemas and Gra.fo. (I’m honored to be chairing the Property Graph Schema Working Group within the context of the GQL standardization effort.)

That is why we are excited to announce three new property graph schema export formats:

There is a clear need for a general purpose graph schema modeling tool. We are lucky to have this opportunity where Gra.fo can be a bridge between both graph communities.

Graph Schema Documentation

Exporting the graph schema to a PNG or SVG image sure is pretty may not be sufficient. The image does not show attributes or detailed descriptions.

An important need is to provide documentation about the graph schema in a way that can be easily consumed by humans. This type of documentation can serve as requirement documentation, project deliverable, etc.

Now you are able to view the documentation of the graph schema in a separate page. Go to File > Graph Documentation.

The documentation has its own URL of the form https://app.gra.fo/documentation/a1b2c3 You can now easily share that link with others who also have permission to view the document.

Need a PDF? Simply print and save as PDF.

Multi-select

What if you need to move multiple concepts at the same time? Before, you would need to move each one independently. That was very annoying.

Not any more! Now you can select multiple concepts at the same time and move them all at once. And it even works in real-time when you have multiple users on the document.

Simply select multiple concepts by pressing shift on your keyboard and clicking on each concept that you want to move. Additionally you can click on the canvas while pressing shift on your keyboard and then drag/drop to create a bounding box.

In addition to moving multiple concepts at once, you can also change the colors and delete them.

Import Mapping

Designing a graph schema is just the first part. You have to do something with it. Our customer’s common use case is data integration. Their need is to map complex source relational databases into the graph schema which models the business users view of the world.

One way of representing these mapping is using the W3C’s R2RML: Relational Databases to RDF Mapping Language. This standard was ratified back in 2012, together with the Direct Mapping standard (I am one of the editors).

R2RML is a declarative language that defines how RDF triples are generated from SQL tables or queries. For example, the following R2RML snippet defines that all the rows of the OMS_ORDER table will be instances of a class ec:Order and that the subject of the triples are defined by that template which uses the values from the attribute OrderId.

@prefix rr:    <http://www.w3.org/ns/r2rml#> .
@prefix map:   <http://capsenta.com/mappings#> .
@prefix ec:    <http://gra.fo/e-commerce/schema/> .

map:Order  a rr:TriplesMap ;
rr:logicalTable  [ rr:tableName "OMS_ORDER" ] ;
rr:subjectMap    [ rr:class ec:Order ;
rr:template "http://www.e-commerce.com/data/order/{OrderId}"
] .

These mappings can be created using editors, or by hand if you are an RDF geek 🙂. Capsenta offers Ultrawrap Mapper, our mapping management systems.

I’m really excited about this initial feature: import an existing R2RML mapping to a graph schema document. Go to Mapping > Manage

Once a mapping has been imported, you will see an icon on the left panel if a mapping exists for a Concept, Attribute or Relationship

If you click on the icon, you will see the mapping details. In this example, we are showing the previous R2RML snippet.

In real-world, enterprise relational databases, the mappings will consist of complex SQL queries defined in an R2RML mapping, as the following example shows:

Once you have the mappings, you have to do something with the mappings. You can use the mappings to physically convert the relational data into graphs (ETL) or virtualize the relational databases as if it were a graph database (NoETL). Capsenta also offers Ultrawrap Data Integrator where you can use mappings in an ETL or NoETL mode, or even a hybrid.

So what’s next?

There are a lot more exciting features coming soon.

  • Gra.fo/Mapper: Importing a mapping is just the beginning. We want you to be able to create your mappings all from within Gra.fo.
  • API: We want to empower users to create their own apps that interact with Gra.fo. Everything that you can do through the frontend, you will also be able to accomplish through an API.
  • Gra.fo Documentation: We have a phenomenal UI/UX team who strive to make Gra.fo very intuitive. Nevertheless, we acknowledge the necessity of having documentation.

We truly appreciate all the feedback that we have been getting from our users. Please keep it coming!