Friday, June 20, 2008

Memo To The Semantic Web: Drop “Semantic” And Become The “Graph Web”

For some time now my team and I have been working on a new web service called Kloudshare. This is not a product announcement, but I mention it because it provides context for what I have on my mind.

Kloudshare is a graph database platform implemented as a web service. The concept of a graph database is that you can create data objects of any type and connect them to other objects at will. Within this structure you can easily query the graph for items of a particular type or items that are connected to items of a particular type that have a particular characteristic, etc.

If you have never though about graphs or taken a computer science class this may all seem a bit abstract. But trust me when I say it turns out that thinking in terms of graphs is a much easier way of thinking about data than what most of us do today. Specifically, representing information in a graph is *far* easier than doing so in a relational database.

When we started developing Kloudshare, we had no idea that it at all related to the work that the semantic web community was doing. But over time, one of the things I came to understand was that the semantic web is based on the idea of the web becoming one big graph database. And so it was appealing that there was a group of people who saw the world, in part, in a similar way to how we do. Unfortunately, the differences in thought beyond that baseline are quite substantial.

The Semantic Web concept is broken into two parts. The first part is the graph database concept, which they refer to using three separate expressions: triple stores, SPARQL, and RDF. A triple store is really the graph database. RDF is the format for the data when it comes out of a triple store, and SPARQL is the language for querying the database. These parts map conceptually very neatly to our work on Kloudshare.

The layer above this data store is called the OWL, or Web Ontology Language. This ontology layer is why the semantic web is called “semantic.” The concept of the ontology layer is to define what the objects in the graph database mean so that when I say an object is a “cat”, we can all know what cat means and how cats related to other objects around them, what they are related to, etc. In my view, this is where the semantic web, as a broad-based concept, begins to fall apart.

Ontologies are great for defining small, controlled, well-defined universes. But the web is not that, and never will be. I am not saying that ontologies have no use, but I am saying that they have use to a relatively small group, or in a few narrow situations. For example defining what a person is on the web, or an event or appointment would be very useful. But trying to create a broad layer of “meaning” for data on the web and presenting that as a primary goal of the semantic web just muddies the waters. This is exacerbated by the fact that OWL is a bitch to learn and understand. Moreover, the semantic part of all this, despite fervent denials, smacks of AI, an abject failure, and makes people roll their eyes instead of roll up their sleeves.

And so, in order for the ideas of the Semantic Web to succeed as a mainstream platform, the concepts and efforts must be re-focused.

First, all the stuff that seems too difficult and not immediately beneficial must be stripped away from the core concept and message. This means, for the basic message, ontologies must go. We must stop talking about them because they just confuse people. And they are not needed to get the most important benefits.

Second, we must simplify our terminology. The expression “graph database” is a single solid term. RDF, triple stores, and SPARQL are diffuse terms as well as being more abstract and harder to understand. When describing what this is all about and you need to use three terms to get your point across, people’s eyes glaze over.

And finally, we must focus on the one benefit that developers will immediately relate to, which is ease of development. It turns out that thinking in terms of graphs is far easier than thinking in terms of tables and relations. The semantic web community is so busy trying to sell a utopian vision of data access that they are burying the lead, which, for a typical developer, is in large part, about ease of development and time to market. Of course you can’t sell that message when you have to talk about OWL, which is the exact opposite of easy. The whole ontology concept has taken the Semantic Web, or whatever we really should be calling it if we drop semantics, and made it both hard to understand and hard to implement. It has also made the real benefits much harder to see.

As I see it, a much better name for the Semantic Web would be the “Graph Web.” People have already taken to the term “social graph”, and on some level I suspect people can already picture the concept, extended beyond just connecting people. The Semantic Web unfortunately needs a fresh start. Changing the name would cause people to take a very much-needed fresh look.

21 comments:

  1. Hank,

    I am quite interested in learning more about these concepts. Can you please point me to some websites or books where I can learn more.

    Thanks,
    Amol.

    ReplyDelete
  2. That is a good question. I am leaving on a trip and don't have any good sources at hand this second. Part of the problem is I don't think I know of any really good intro sources. The really techie stuff you can find by googling "semantic web" and "w3c". But these are tech specs not "semantic web for dummies."

    ReplyDelete
  3. I find Topic Maps to be a much more accessible form of "semantic web" concepts than the W3C stuff.

    Topic Maps are an ISO standard. Ontopia is a leader in the space and has written some good intros on Topic Maps.

    ReplyDelete
  4. Hank,

    Sorry we didn't get to chat during Linked Data Planet :-(

    I think the "Linked Data Web" clearly describes the burgeoning Web of Linked Data Sources where the Names incorporate HTTP.

    http://www.openlinksw.com/blog/~kidehen does have a post and link to my keynote about this matter.

    Kingsley

    ReplyDelete
  5. Your argument against ontologies is not convincing. Just because many people find it confusing, doesn't mean it's not useful. Most people don't understand how a processor works but they know how to use it. All we need is a community of developers who understand the basic concepts.

    I like the idea of graph databases and I agree that it is a generalization of the concept of relational database.

    Overall, I don't agree with you that we have to drop semantic web. I think defining the relationships between different memes is something that we have to start doing at some point.

    ReplyDelete
  6. I was at Linked Data Planet where Kingsley gave that keynote, and I think neither the conference nor the keynote made a commanding case for "Linked Data Web". Most particularly, I think Hank is exactly right that this is fundamentally a database-tech matter (not a web-tech matter, although the database tech is particularly useful for web stuff), and that the web emphasis in the Linked Data agenda is thus misplaced.

    For some similarly-minded rants, see Never Mind the Semantic Web, Semantic Bran and Death to False (Metal).

    ReplyDelete
  7. Couldn't agree more. RDF definitively solves a deep and important problem, but it's a very low-level problem.

    RDF triples are like triangles in graphics - once you can draw triangles you can draw everything, or like packets in TCP/IP. The application layer is a whole other animal.

    So there may well be a ton of value in Ontologies, or NLP, but RDF doesn't stand or fall by that.

    RDF is globally distributed data publication - just as the web was globally distributed document publication. That's plenty for what I need just now. YMMV

    ReplyDelete
  8. Hi Hank,

    I just read your blog article, that I believe is the result of our discussion with Jim Hendler and David Siegel after the LinkedData Planet session in the restaurant. You have good company in the city as Clay Shirky [5] would share your point of view wholeheartedly, I would say. I agree and would like to emphasize that graph database has it's place in the Semantic Web for reasons mentioned in your blog, but that does not mean that we have to drop ontologies from the picture altogether. Ontologies actually are part of and will help you to enrich your graph, in the graph database or in the triple store.

    read my entire reply here:

    http://semweb.meetup.com/25/messages/3211506/

    ReplyDelete
  9. Without a semantic layer, I can build my own RDF graph and use it for my applications and that's all good.. but when I want to integrate my graph with the rest of the net, don't things fall apart?

    And isn't integrating with the rest of the net the whole point? Semantics is hard not because of OWL but because it requires that a group of people agree on the meaning of things -- and that's always difficult.

    Maybe the fully distributed vision of the "semantic web" is too ambitious and something like Freebase or just more usage of RDF graphs is the first step but eventually, we'll have to deal with the semantics problem..

    ReplyDelete
  10. I agree with this post in so many ways that it's hard to explain them all. I personally got into the Semantic Web from a purely practical perspective and I am still surprised that so little has been said about it.

    I guess it's because Graph Oriented Programming offers so many benefits to so many people that most people don't understand the ways that it can be useful to them.

    Many Lispers get it because Lisp is pretty much a giant graph whose triples are mostly in the form of 'A then B' (a.k.a. a list) so Graph Oriented Programming comes fairly naturally to them, which is probably why the makers of AllegroLisp also offer one of the best graph databases (AllegroGraph).

    All the Java programmers that are becoming enthusiastic about Dependency Injection and Spring are in essence just starting to learn how to build their programs from a well-defined graph. In their case XML's verbosity and tree-structure, the arbitrary separation between XML & code and the difficulty of extending Spring severely limits it's flexibility and usability.

    The people who are starting to create DSL's for their systems are already pretty much doing Graph Oriented Programming, except for the fact that they use a parser to convert text into graphs and grammars to indicate what the system will understand instead of simply letting people directly create graphs and have a well-defined set of constraints (like OWL) that verifies if the system will understand the given graph.

    In the meanwhile most of the advocates of the Semantic Web are focussing on developing complex all-encompassing ontologies and their even more complex implications which, even though it is a great way of stretching the limits of what we can conceptualize, completely ignores the tremendous power and benefits Graph Oriented Programming can offer to the millions of regular programmers out there.

    Personally I think having a few thousand programmers actively using the underlying concepts of the Semantic Web in their everyday work would be a pretty darn powerful driving force behind the Semantic Web.

    ReplyDelete
  11. Have you heard of Freebase? It seems similar to what you're working on.

    http://www.freebase.com

    ReplyDelete
  12. Anonymous,
    Freebase is a structured wikipedia. That is not what we are doing. We would be more comparable to simpleDB or bigTable, but with a graph model instead of a flat database model.

    ReplyDelete
  13. The "Semantic" part of the Semantic Web goes far deeper than the ontology layer. Consider RDF Semantics which among other things defines what an RDF Graph is and how an RDF Graph should be interpreted.

    RDFS was an early attempt at an RDF schema (ontology) that was eventually superceded by DAML, then DAML+OIL, which was then renamed OWL.

    The main purpose of the "Semantic" component is to enhance automated reasoning based terms represented in an RDF Graph. The OWL-DL and OWL-Lite sub-languages of OWL can be mapped onto Descriptions Logics, which are decidable fragments of First Order Logic (FOL). This is useful, since researchers can use the FOL tools they've been using for decades in a new setting.

    Aside from inferring new knowledge, ontological reasoning can tell you if an RDF graph is consistent with a given ontology, and can answer simple questions, like whether a human is a mammal, or where in a wine hierarchy Merlot sits.

    Just goes to show that there's more to the Semantic Web than just RDF APIs that can manipulate RDF/XML or triple (and quad) stores. Semantics is important and should not be discarded because you do not have use for it yet. Yes, it is difficult (the Free University of Bozen-Bolzano Amazing list of DL literature), but semantics can seriously enrich content, and just a little semantics can go a long way...

    ReplyDelete
  14. "But trying to create a broad layer of “meaning” for data on the web and presenting that as a primary goal of the semantic web just muddies the waters."

    Would you mind elaborating on this idea? What is a practical example of a broad layer of meaning for data on the web?

    I see the semantic web as a means to make data machine readable; a path to accessibility. Using a common markup languages (like HTML and/or microformats) allows any device to read and make use of data instead of having that data be exclusively readable by people who can access and use view ports. An example is screen readers (a device commonly used by the legally blind to translate web pages into braile) and a more commercially desirable device is mobile phones. Allowing mobile phones the ability to traverse data and pick out parts that the user might find particularly useful (like an event address, date, and time) can be easily achieved with standard machine readable markup languages.

    What I'm getting at is that the semantic web has practical uses. I feel like an ontology is the only path to machine readable data and therefore also the path to the broadest possible accessibility. It is really important to me that electronic data can reach the largest possible audience.

    I am only just now looking into the possibilities of graph databases and am very intrigued and still learning. If I don't find any major drawbacks in relation to the project I'm working on then i'll be using a graph database as the backbone. For this reason I'm highly interested in an elaboration of how your idea of the graph web can better describe data than an ontology. I feel like I'm just not understanding a large part of what you're trying to say about the graph web paving a better road to accessibility than the current semantic web movement.

    I've been reading about neo4j and it seems legit. What do you think about it? Is there anything more you can discuss about Kloudshare?

    best, kai

    ReplyDelete
  15. To me, the semantic web makes a lot more sense when one uses rule languages instead of description logic (note though, that there is overlap between the two [1]). Then it is about adding computed data to declarative data and possibly about enforcing constraints. I am still looking for an OWL replacement that is similarly frame-based, but more about schema and constraints instead of inference and reasoning. So far, no luck, but [2] seems like a good starting point.

    [1] "Description Logic Programs", Grosof et al.
    [2] "OWL DL vs. OWL Flight: Conceptual Modeling and Reasoning for the Semantic Web", Jos de Bruijn et al.

    ReplyDelete
  16. Very true! Semantic Web in the US is a red flag, specially for business. If you say a Graph Web, you will certainly get money, and underneath you can use "semantic web technologies" TimBL still has to admit that he made a mistake by calling it the Semantic Web.

    ReplyDelete
  17. Hank! So far so good. But you haven't told us the next step you want to take. We've all been doing semantics since we were children. We all encode our ideas in language of one kind or another. Some dance, some draw pictures, some write, some build schemas, some write programs. The term semantic, in this sense, refers to encoding knowledge, meanings, behaviors, etc. separately from texts, pictures, moving imagery, databases, algorithms -- in short artifacts -- in a form that BOTH humans and computers can interpret. Another key part of the new paradigm is architectures of learning. Here is a suggestion. Go to www.project10X.com and download the free Semantic Wave 2008 Executive Summary. It's about 30 pages. Worth reading.

    (Jaun, by the way, is behind the curve. Semantic technologies are already coming into business.)

    ReplyDelete
  18. Hi,

    I have to say I agree wholeheartedly that we should concentrate far more on immediate simple solutions that we can provide with all these graphs being available _now_, and leave complex ontologies in the fridge for a few more years.

    Most people dealing with semantic web stuff don't understand that complex solutions are hard to scale.

    That's exactly approach we are taking at Zemanta, use semantics in the background, but provide useful and simple interface to the user. And the API (in private testing) takes the same concept, very lightweight in semantic terms, but directly useful for any web developer trying to enrich and annotate content.

    Andraz Tori, Zemanta

    ReplyDelete
  19. Hmm, you don't need to go as far as OWL to start having semantics, RDF+RDFS can be seen (as well as a graph) as being a little chunk of first-order logic and/or a bunch of set theory. But I do agree with your underlying point - in the Web environment the graph aspect is the most useful (and interesting). The logic is a nifty bonus, but not necessarily suitable for every situation. Massive scalability comes, as in the document Web, from having multiple interlinked graph databases (which may just be documents). Linked data is the Web done right, like the man (timbl) said.

    ReplyDelete
  20. The word 'net' as in 'semantic networks' means essentially the same thing as 'graph'.

    For choice of word, Net is 2 characters shorter and also aligns with 'networks' or 'internet'.

    'Graph', on the other hand, conflates with "graphics"

    ReplyDelete
  21. For those looking for a good book/resource on all of these technologies, I highly recommend "Semantic Web for the Working Ontologist"...

    http://www.amazon.com/Semantic-Web-Working-Ontologist-Effective/dp/0123735564

    ReplyDelete