Monday, March 24, 2008

Most Of The Data We Save Will Never Be Useful Again

Emails. Contacts. Documents. Bookmarks. Checks. Receipts. Notes… stuff.

You save everything, or at least you try. And when you do it, you feel like you’re really doing something good. But deep down there is another voice that speaks a truth that you would prefer not to acknowledge. Every time you implicitly or explicitly save something you know that may be the end of that bit of information forever. Innately you understand that most information loses its value the minute you save it. This is because, while you can save the data, it’s almost impossible to save the context.

By context I mean how it relates to the rest of your universe. For example, if I save a Word document, there is no way to indicate that it relates to a person or an email, or a Delicious tag, or anything else. A big part of the problem is that all of our data is in separate silos. Applications – be they web apps, or desktop apps – tend only to know about the data that they create. Its what I call “silo-ization.”

One solution to this problem might be some kind of intelligent application that can read all the stuff you create and figure out that certain thinks are related. And certainly I imagine that at some point we will have applications like that. But on a more basic level, we almost need a kind of universal framework to allow for manual or automated connection of heterogeneous objects. In other words, the “smarts” to help us connect things is of no use if there is no acceptable *way* to connect them. You’ve got to be able to connect before you can think about what to connect.

Allowing for the creation of context across data silos means that every piece of information that is important to me can be placed in the context of every other piece of information that is important to me. So if I look at a document, I’d like to be able to see who sent it to me or whom I sent it too. I’d like to be able to see that I tagged that document with the same tags as a bunch of web pages that are somehow related. I’d like to be able to see that the person who sent it to me has accepted an invitation to a meeting that I will also be attending.

The point is that almost every piece of information we collect has a trail to other bits of information. Right now we can’t see those trails. And so, our data spaces, be they our Gmail archives, or our hard drives, or our Delicious tags, etc., are really more like old attics with years of junk covered in thick dust. Modern software technology can and should do better.

4 comments:

JohnNullstream said...

I hear you. I too have been on a recent quest for this. I have been using Evernote (www.evernote.com) for a couple of years now as a single silo for most of my stuff. It is very strong at data mining to get back at your notes, or clips etc. It still lacks the kind of 'context' you are talking about. It can link to a file, a site etc, but it is up to you to manually add what ever context you think is relevant. Some people are also trying to use a mapping program like personal brain (www.thebrain.com) to achieve similar relationship links.
I think we are a long way from the kind of solution that will truly solve this problem however.

Kevin said...

Welcome to getting old in IT. Every generation has had this realization and every time the new generation of innovators completely (and IMO, rightly) ignore the legacy archive in favor of developing new tools to generate even more data.

It's fun watching greybeards complain about not being able to read their thesis done on punchcard, unisys, burroughs, wordstar, whatever.

Yes, computers and tech keep getting more robust and you'd think maybe THIS TIME we could figure out how to make it all work together. And there are entire enterprise class solutions aimed at trying to keep a business's legacy data useable. But as the young innovators keep pushing, we just increase the complexity by magnitudes making it more and more unlikely that we will ever wrangle our legacy of data together in any meaningful way.

Matthias said...

I agree for most parts, but I think you have to distinguish between highly networked data and such data purely for your private (as in "not visible to others") use.

Since you already brought up del.icio.us, I think this is one example where it is not pointless at all to save a bookmark *even* when you yourself are never going to retrieve it again. Users of social networks and aggregator sites can see what other ("smiliar") people have saved and can find content which may be of higher value for them than a hit on google. So although the act of saving away a video, a (public) note or a bookmark on the Web may not turn out to be useful for yourself, it may still be useful to other people. This is what drives "Web 2.0" afterall.

By the way, there is a research project going called the Nepomuk Semantic Desktop, funded by the European Union, which aims at providing a solution to the context problem, by figuring out what concepts are "active" during certain tasks and relating them to other concepts you have used before (from your Personal Information Model). However, I don't see it being used by the casual user anytime soon, but it's an interesting approach.

Tim said...

another good article. Im pretty disorganized, so thanks - I feel better now.

Post a Comment