Brian's Waste of Time

Fri, 23 Jan 2004

A Plan for Suet

We've been hearing how subscription based, xml formatted, data feeds are going to revolutionize network communication for a while now. Blogs have already demonstrated that this is possible, they are a spectacular proof of concept for the ideas and have already revolutionized online communication for many people.

So, here is an idea for expanding current common use of data feeds. The system doesn't exist except in my imagination (and probably other people's, it is an obvious concept) at the moment, but it has been mulling since a BOF at ApacheCon where I learned why I should care about Atom (thank you Sam, Mark, et. al.). It probably has a lot of overlap with the semantic web stuff, Chandler, etc -- so smarter people than myself are working on the same problem. This would be a baby step (like interceptors compared to full aspects), but maybe a useful one.

The first component is to provide feeds from boring data sources -- file shares, imap folders, CMS systems, gump, etc. Think of an entry as an atomic descriptor for a piece of data -- a nice envelope with a pointer back to the real item. A feed then is just a source of atomic entries. Create an agent that can walk samba shares and extract entries from files(even if it is just running strings on em, provide rudimentary categorization information based on simple rules (files in the Acme/presentations go to the Acme category). This is simple stuff, not rules engine or textual analysis stuff (though that could be done too -- is just a lot more complex). It provides a feed of what it finds. It updates it when a file is added, etc. Next create an agent that can walk shared imap folders. Same story. The key is that as new things appear, entries are created for them.

This produces a ton of data, but it is a ton of normalized, pre-categorized, data which is important. It is interpretable (we don't need rdf for this, really, we don't -- it might be useful for specifying categories, but we are not looking for globally usable relationships here). Humans could, in theory, subscribe to these feeds, but they'd be pretty much useless to them -- too much information.

Part two is an aggregation server which subscribes to these feeds (or runs the agents as plugins, though I prefer the looser coupling as polling of agents is not a real-time thing -- an every thirty minute batch process, but still a batch process). The aggregation server provides feeds as well, but these are category specific feeds from the aggregation of all the data it gathers. More importantly, to blatantly steal an idea from feedster, the ability to subscribe to searches specifically filtered (hard requirements (must be 100% identifiable as Acme related), soft limits (see google), etc.

Clients in turn subscribe to the categorized, sorted, etc feeds from the aggregation server. Because a feed is a known, and planned, aggregation a better UI can be designed for it. Most clients now are designed for reading short articles, not categorized, tagged, aggregate data culled from classical document and data repositories.

The clients are key, as always, as they determine if this is an intellectual exercise or a useful tool. One vision is to have different styles that know how to deal with different sources -- Acme related feeds culled from shared imap folders may be downloaded entirely and injected into a local mail folder. Teasers from MS Word docs on a filesystem may be just listed in a side panel in Outlook/Notes/Mozilla Mail/etc. Headlines from anything may be provided in such a list as well -- with initial click going to the local cached feed information (the mail folder, file summary, etc), form there navigation back to the original source is available.

1 writebacks [/src] permanent link