In need of a link log
Happy Fourth Birthday to RSS (via Dave, via Dave)
Looking for a picture that would explain why I thought Annabella Lwin should know her audience (or at least one member of it), but they’re oddly hard to find. I’d forgotten that she was only 16 at the time, though. Damn I’m getting old.
Simon’s noticing the dark side of browser tabs. Even worse than just having 57 tabs open in six browser windows is doing that on a less stable operating system, and watching them vanish in a crash. Any more, I stick to just one window, where I rarely open more that twenty tabs before they get crowded enough to make me actually deal with some of them.
I really need to get Micah Alpern’s microblogosphere search tool (searches Google sequentially for the sites in your RSS aggregator’s OPML subscription file) unzipped and installed. There’s an awful lot of interesting stuff going on now around the idea of finding things that people are saying about things that interest you, whether by searching RSS at Feedster or by aggregating TrackBacks on a topic at Phil Pearson’s Topic Exchange. I just wish I could see the happy medium between search’s “publishers can just do whatever they please, and searchers have to come up with the words they used” and centralized aggregation’s “publishers must all use the same word, so that searchers can find everything in a single place.” I’m not even sure whether what I’m looking for is better search, along the lines of latent semantic analysis, or better aggregation, with someone or something figuring out that my “MT hacks” category is pretty much the same as someone else’s “Mov(e)able Type tweaks” category. I know I want it, that I’d rather read a couple thousand posts that interest me culled from ten thousand weblogs than skim ten thousand posts from a few hundred weblogs while doing my own culling, but I’m not even sure which direction to look.
What I’d really like to do is index the whole web (or, failing that, the whole blogsphere) with Autonomy. This ”find out what’s relevant to you and to what you’re looking at” problem is, essentially, the same one that major businesses are facing under the title of Knowledge Management. Now, there’s a great, great deal of corporate bull about that, and there’s also firms charging stupid money for something that you could put together with ht://dig which big firms buy because it costs money and therefore is good. However, Autonomy is evil witchcraft. It’s really, really quite amazing. It can categorise all sorts of disparate sources into categories, it can relate documents to other documents, it can index everything. I mean, it also costs daft money, but the tech behind it is presumably one that people could duplicate to some extent. I sound like a paid shill for them here, I admit, but I was blown away by just how cool it was when I went to see it for work. Some more thought required on how we could do this, especially since the Autonomy approach is still ”one big server indexes everything”, and we want a distributed ”everyone does one little thing and it all works” approach…
Hrm. Bloody hard to read through their incredibly buzzword-rich site to get to the meat, but it certainly does sound like they are building my dream machine. Several bits, like their spectrograph view of clusters getting hot, cooling off, and forking over time, look instantly useful.
I just can’t imagine how we could possibly avoid the ”one big server (cluster)”, though, since what I want are things I don’t know are related (you say ”cluster modeling” when I’m saying ”word bursts”), which are being published by people I’ve never heard of. Yes, I want my personal knowledge server to gulp down the feeds from the ten thousand weblogs I know about and know I’m interested in, sorting and clustering them so I can see ”what we’re talking about now” and also ”what we’ve said about this”, but I also want an ”everything said precisely about this by everyone, with clusters of very closely related topics”. Maybe it’s possible for my server to share its concept map with enough other servers to somehow combine them into a distributed version that would take me right to someone’s post of interest even though in link terms they’re an untraversable distance away. I dunno. I hope some really smart people are working on it for me, because it’s way too big a problem for me to be able to keep it in my head for long.