Thursday, February 28, 2008

I Want Data Visibility More Than Data Portability

Data portability has become a huge meme in the internet universe in the last six months. I am very supportive of the ideas behind data portability, but I am not sure that actual "portability" is really what I most want as a user.

Portability typically implies import/export. I can move my data from here to there. Certainly there is value to this, but it seems to me what I really want is a unified "data location agnostic" view of my data. For example, I'd love to be able to do a search in my data universe and find everything with the words "waterfront project" across all my data silos like Facebook, Google Apps, etc.

Similarly, I'd like to be able to relate a document I was working on in Google Documents to Mary Smith who exists in my universe as a Facebook profile. When I am looking at Mary Smith's profile I'd like to see that Google document listed somewhere since it is related to her. But I don't want to move the document to Facebook to make that happen. 

The point is, what I am really interested in is not porting data but connecting and unifying it. And I'd like some kind of dashboard from which I could really see and explore as much of my universe as possible.

And so it seems to me a good word for what I am talking about is data visibility as opposed to data portability.

This visibility concept really encompasses two elements:
  1. I want to be able to do a full text search of my data universe.
  2. I want to be able to connect items in my universe regardless of their type or location. For any given item in my universe I would then be able to see all the others items in my universe that relate to it. 
To geek out for a sec, to achieve all of this what is probably necessary is for as many applications to provide search APIs and unique identifiers for each data item. To make it universal we would also need some kind of "field availability" API that would let us see what types of fields can be searched.

But regardless of the tech specifics, this is a simple concept. It would facilitate a whole new way of thinking about and organizing our personal data universe. I think it would be incredibly helpful. Do you agree?

11 comments:

  1. I think what you are looking for is called Linked Data, or hyperdata. That allows one to bypass all the silos. I recently argued in "Proof: Data Portability requires Linked Data" that you can not have one without the other. Linked Data also has the advantage of being RESTful.

    Linked Data does not require the data to be visible to everyone btw. There are simple ways, that need to be formalised and explored to give access to different parts of the data to different people.

    Henry

    ReplyDelete
  2. Thanks Henry,

    linked data certainly seems at least in part on point. As my readers know I take a somewhat dim view of the semantic web technologies and their broad applicability by regular programmers. Linked data would also not address the search issue, but that is ok, multiple APIs to get the job done is totally fine.

    ReplyDelete
  3. I guess I'm just unimaginative, but I just don't see the pain point.

    I suppose it would be cool to be able to search for my friend "Sara Williams" and get a dogpile style set of results for pictures I have of her in Flickr; email messages; and facebook messages (which now appear in email anyway). But after that, my imagination grows dim.

    But the pain point for most people is the persistent unease they feel in being absolutely powerless to back-up their online email accounts - which contain their memories, business contacts, record of charitable donations, etc...

    ReplyDelete
  4. "But after that, my imagination grows dim."

    Thats why I am here to help you out :)

    There are two scenarios. You mentioned one. For example, you wrote about something, but you don't remember whether it was in a comment somewhere, on one of the 50 web sites I am a member of, in my email, or in a comment attached to a contact entry. The point is I should be able to find stuff not just on the whole web, but in the universe that is my data.

    The second issue I am afraid does require a bit of imagination. Some ideas really require a certain class of users to *use* stuff before they can imagine it. Much like the PC would not have initially done well in market research studies. But all that said I will try to give you a more concrete example.

    Imagine being able to look at your emails, and for each email, see, under the email or next to it, the documents you have written related to the subject of the email, the business cards of the people that are relevant, the todos that you need to carry out related to it, etc.

    The idea is that each information object in your universe, emails, contacts, documents, etc is related to other stuff. But typically because that information is spread across different types of information and different actual data silos, the items are not connected. Being able to look at anything, and see everything that relates to that thing is incredibly powerful. Imagine being a sales person, and looking at an email that a customer sent and being able to see that the other person CC'd on the email is his boss, and that it is is birthday, and being able to see that you were supposed to follow up with a product spec, and that you had written a document 6 months ago that you sent them that is now out of date.

    Being able to see all of the items related to a given item is I think helpful in almost any context.

    ReplyDelete
  5. Seeing as my job involves putting "Business Intelligence" apps (i.e reporting & analysis) into the cloud, I can completely see the value of this.

    Anything that makes data from applications visible to other apps gets my vote. BI without data is a tough sell.

    ReplyDelete
  6. Dunno. Maybe it's just me, but I *do* want data portability. I love using services on the web, but I won't use one that doesn't let me export data that's important to me. To me it's kind of a fundamental right, as it gives me the ability to switch services at will.

    If a site doesn't, then I'm perfectly willing and able to find some open source package that lets me host the service on my own server, but I'm obviously in the minority of the general population in my ability to do that.

    Anyway, I can't speak for anyone else, but portability *is* important to me.

    But that said, I'm curious how much the average Net user knows or cares about it though. If I had to guess, I'd probably say: very little.

    ReplyDelete
  7. In response to Darose:

    I think Hank Williams has the right idea: importing and exporting data requires a common data format. But you need only look to the document format wars to see how much overhead and inevitable revision exists in trying to create that common data format. In order for data portability to be usable, you will inevitably need to convert between SOME two common data formats to reconcile fundamental differences.

    So, why shoot so high so soon? Why reach for the holy grail of data portability when instead you could get the instant gratification of data visibility? With data visibility, users and user communities are empowered to infer their own ad hoc common data formats that suit their own needs, and invent their own means for reconciling the differences between that data.

    I've only been reading Hank Williams' blog for thirty minutes, and I already think this issue is central to his focus. Hank, what is the "new data and web development platform" you're working on?

    ReplyDelete
  8. Donny,

    Thanks for the comment. The only thing I would disagree on is the idea that data visibility is shooting lower than data portability. To me they solve different problems and I'd really like to have both. The thing about data visibility is that I can connect data in different places and that is something I would want even if I had full data portability.

    As for what I am working on I am not quite ready to talk about it, though if you read the death of the relational database, you might be able to read between the lines and get a sense of some of our technological perspective.

    ReplyDelete
  9. Hi Hank,

    I was recently employed revamping a "business logic" application that a small translation company used to coordinate their efforts. Here was something I started writing in the hopes of being able to extend all their existing "rigid" models with arbitrary graph connections which would allow users to represent new ad hoc properties of models including relationships between them.

    http://codebad.com/tagpower.mysql

    A project that is very similar in premise to what I wanted to do with their business logic system is a project by the Dojo Software Foundation called OpenRecord.

    Also, your article "The Death of The Relational Datbase" is excellent, and I'm glad to finally meet someone out there who's talking about the same things I am.

    In that article, you also said "semantic web is too great a leap," which is more or less what I was trying to say when I suggested that data portability is too lofty of an immediate goal, and we should shoot for something lower in the interim. Of course, I also agree that data visibility fulfills other fundamentally different goals. My point was only that data portability is more complicated, and the best way to achieve it is to first achieve data visibility.

    ReplyDelete
  10. Hank, good post. I see things in a very similar way. So you may want to look at foldier.com - that I am starting with some friends - as it implements data visibility pretty much the way you describe it.

    ReplyDelete
  11. I think I know a newish seach engine that does what you're asking for: check out Endeca. I saw a demo by these guys and was kind of amazed. Basically, the match hard core mutli dimensional analysis with hard core search to create, well, the thing you asked for. :)

    ReplyDelete