Tuesday, July 1, 2008

Adobe Announces Making Flash More Searchable, May Be Over Promising

Today Adobe announced that they have been working with Yahoo and Google to make Flash more searchable, and have provided them with technology help them do so. Google also provided more details here. But based on what has been said so far, I fear the announcement may be a bit of an over promise.

Unfortunately, there is no technical discussion or whitepaper to go along with this announcement, and with such a broad and important technical claim, more is needed than a one-page press release or even Google's Q & A.

Nevertheless, the problem Adobe is trying to address is an important one, and so the issue is worth some discussion.

The problem Adobe is attempting to solve is really broader than “Flash can’t be indexed” as many people today think of it.

As many developers are moving more of their application logic from the server to the web browser, indexing becomes harder. Browsers used to just grab web pages and display them. Now our browsers are becoming full-fledged application environments. They are almost mini-operating systems. And in that sense, many web sites are really more like in-browser applications than they are what we used to think of as web pages.

Flash is a leader as a technology for deploying such in-browser applications also know as Rich Internet Applications (RIAs), but many such applications are also written in Javascript. And so it is important to keep in mind that the problem of indexing RIAs is a problem for RIAs written in Javascript as well as Flash, though this fact is rarely discussed.

The core of the issue is really that web pages can be indexed and applications can’t. Or at least it doesn’t make sense in the most basic sense, to index applications. And this issue is really more conceptual than technical.

For example, if you indexed a copy of Microsoft Word on your computer, what would that mean? Would it just mean that you could find all of the text from the menu bar and the dialog boxes in Word? And would that be useful? Probably not.

Today’s flash applications, particularly those written in Adobe’s Flex application framework, are really most often views into data that is loaded from a database or server somewhere when the user runs the application. So the interesting data isn’t actually ever in the actual Flash application swf (pronounced swiff) file. In other words, indexing the swf does no good.

What happens in a typical Flash application is that the user enters some search text or clicks on some button that says, for example, show me red shoes. The app then calls the database across the web, grabs the list of red shoes, and displays them. So in order for the search engine to be able to find the fact that my application shows red shoes, it needs to provide some input into the application to make it display that information. It needs to pretend to be a user and trick the app into going into the “red shoe” state. Most importantly, if the application changes state, that state needs to be captured as a change in the URL so that it can be referenced later.

Note that this is exactly what search engine indexers do with standard web pages. The difference is that it is much easier to do it with web pages. Web links are easy to follow and so it is easy to change the state of an HTML web site, and it is much easier to see what all of the available states are. All you do is scan the page and look for all the links. In any real application, whether written in Flash, Javascript, or even C++, states are much harder to discover and generate, particularly when they may involve an expectation of clicking on some button or typing something.

What is interesting is that in the Adobe announcement they seem to be implying that they have solved this issue. But my problem is I don’t really see how. I am confident they can do some useful things leveraging Flex applications ability to support deep linking. Without getting too technical, there are things that Flex application developers can do to make their states more accessible via the URL bar, which would certainly be helpful here. But I do not see how, without much more explicit developer tools and actual work on the part of developers, that most existing RIAs, Flash or otherwise, can be broadly searchable.

So, in short, my concern is that Adobe is announcing that they can now index Flash applications by being able to manipulate state and I strongly doubt that what they are announcing works as broadly or as compellingly as the press releases implies. My suspicion is that it only really useful for a small subset of the real Flash applications out in the wild, since even if the search engine finds the text it will, in most cases, not be able to get the application into the appropriate state to show that text.

And so, until Adobe, Google, or others issue developer best practices guidelines for working with search engines, I suspect this announced technology/initiative will be more limited in its benefits than some might expect.

5 comments:

Demono said...

Couldn't adobe just incorporate browser-readable tags into the flash movies unless that exists already. Of course it would rely on the flash author or someone else to correctly tag it, but I can't imagine it being any harder to do than meta tags in HTML.

Sasha Chedygov said...

@Demono: Something like that already exists, and it doesn't work very well. It doesn't give you any flexibility at all, so using it for RIAs wouldn't work well.

@Hank: But why would you need to index RIAs? There are a couple types that I can think of that need it, but most don't have any reason to be indexed by a search engine. Let's use word processors as an example. What would you be indexing? People's documents? You can just have a plain-HTML version of the document if you really need it. Instant messaging? Image editing? What's there to index?

Stefan Richter said...

Hank, this new indexer does dig deep, it even fetches remote data via URLRequests and more. In fact, some think it fetches too much. You should check this out:
http://aralbalkan.com/1404

Hank Williams said...

Thanks stefan,

Though in looking at the link and the comments the situation appears to be similar to what I would suspect, which is that it is just extracting data without any context. It really doesnt sound very useful based on what aral and in particular peter elst are saying. Its *far* less useful than crawling html pages. Finding text without context or without really understanding how to recreate the state (which is probably impossible to do from a google search page) is in many cases pretty useless.

Hank Williams said...
This post has been removed by the author.

Post a Comment