No unread items

In the past, when I got back from my annual offline vacation, my biggest problem was plowing through email and spam. This time, that wasn’t too bad, but the unread item count in Bloglines was a killer: 5250 unread items. A few hundred, certainly less than a thousand, were just things like stale weather reports, or Wired articles that someone else would have linked if they were worth reading, but still… Most of a week of solid reading, to get to that blessed “No unread items.” Herewith, a nice fat linkdump of some (though certainly not all) of the things that caught my attention:

Markover: limpid.nl
Anne on the sensible step of taking a site with no use for XHTML back to HTML. What struck me, though, was the assertion that using numeric character references to encode an email link worked perfectly for avoiding spam harvest. After I told Luke the other day that they don’t work, I put one in my main page with a throwaway address (got to work on that pontificate-experiment order thing), and got spammed in 9.5 hours. Other than an unknown difference in how often each page gets crawled by harvesters, the only difference I can see is that his is encoded with hex character references, and mine with decimal. Curious.
Comment Spam, Again
“In line commenting is an essential element of the emerging transparency which makes online communication interesting, and possibly revolutionary. So I’ll deal with the spam protections, though I’d rather just see them shot.” Well said!
Feeds Without Dates and Being Too Clever for Your Own Good
Well, sometimes the workarounds for feeds with missing info work, sometimes they don’t. I probably wouldn’t have ever thought to adjust the faked date of an item without a real date to fit with another item that links to it, but if you told me you were going to try it, I also wouldn’t have guessed the pitfall.
Atom and Cool URIs: dogma, idealism, expediency
Every time I read about Atom Co-Chair Tim Bray saying essentially that if your permalinks aren’t absolutely certainly fixed for all time and thus suitable for Atom ids, you should make it so, as though that was all it takes, I worry for Atom’s future. A tiny shred of understanding about how things are for mere mortals would do wonders.
RSS Scaling Problems: How Can We Help?
Some good and some other ideas. Maybe with our next feed format, we’ll really realize that it’s all about entries, and come up with a workable way for a client to say “I know about entry {id} and previous as of {datetime}, got anything newer or any changes for me?”
Sleeping better and blogging more
Sounds like Russ is quite happy with comment moderation. Of course, the fact that one of the comments on that entry looks to me to actually be comment spam is a bit worrisome, but still…
GMail Notifier Extension 0.3.2
Handy Firefox extension for those with GMail who aren’t using it to subscribe to the atom-syntax list (as Bill de h├ôra put it in Freak Atom Occurence, “There’s been no mail from the atom-syntax list in the last 90 minutes or so. How odd.”). atom-syntax means never having to wonder whether you’ve got unread mail.
There Be Dragons Here
“Phil’s ‘well-engineered stuff under basic presentation'” – damn, that’s the nicest thing anyone’s ever said about my total lack of design ability or sense!
The trouble with comments
“Every time I have to wade through a pile of comment spam pointing to sites that sell degradation and the sexualization of misery I feel a little more depressed. At some point in the past few months, I passed out of the relativistic bubble I’d sealed myself into as that sort of stuff passed through my inbox and over my pages and into a state of anger and sadness.” Me, I rarely think about the actual what of what they’re spamming, but, “Maybe it’s about having Ben in the house now and the involuntary process I go through, as have other parents I’ve talked to, when I’m exposed to something I might have previously blown off.”
WordPress Gotcha
Mostly just a note to myself, to look into why using HTML entities in entries would be making Dorothea’s XML feeds not-well-formed. Surely they would be escaped, wouldn’t they?
Keeping Technorati up to date with Apache log analysis
As usual with Ben Hammersley’s stuff, this is clearly either brilliant or insane. Anyone with a good idea who is linking to them knows that Technorati misses lots of things, so Ben has a handy Perl script to help them out: it runs through your Apache log, looking for referers that Technorati doesn’t know about, and when it finds one it pings Technorati on their behalf. Hmm.
Attacking spam methods is useless
Jay’s staying on message, saying that there’s no way to combat comment spam except by targeting the URLs they want to have linked in your comments. Unfortunately, targeting URLs is targeting methods, too. URL-space isn’t quite infinite, but it’s too bloody big to block all the possible bad parts for each individual. If you block texasholdem777.net, and a thousand URLs that redirect to it, then the spammer just needs to find someone who doesn’t block it, spam them, and then spam their blog in your comments, to transfer some of your PR on to them. The only thing that isn’t targeting methods is when Google spots comment spam, and penalizes the spammer, and does that so often that all possible comment spammers know that the risk is far greater than the possible reward. Oddly enough, looking at backlinks with link:spammer.com and at toolbar PageRank for random spammed domains, it looks like they already are working on it. Until that happy day, we’re all targeting methods, some good (forcing preview, blocking URLs, selective moderation), some not (CAPTCHAs are evil, referrers don’t work, IP banning’s naive).
PHP in contrast to Perl
For me, the most telling thing is the quote at the bottom: “Comparing PHP to Perl is like comparing pears to newspapers.” But, I would note that PHP, with its 3079 functions that you have to look up every time, makes it very easy to look them up (php.net/mysql_real_escape_string) and has very good clear documentation, whereas Perl, with its 260 functions, jolly well better be easy to memorize, because if you forget how to do something you’ll be googling for a comprehensible answer for hours.
Wishlist: the million monkeys at a million typewriters plugin
Matt wants an MT plugin that will let a select circle of friends correct typos in his posts, without having to email him saying “Matt, old stick, ‘you are’ is still you’re, not your” every time. I can’t quite picture it as plugin, since you’re really talking about adding users with a new class of permissions, probably to edit posts (by choice, with a revision history that can be rolled back), and edit comments and pings or send them to moderation, so it’s going to take pretty serious hacking for anyone but 6A to do it, but having a half-dozen friends using a dozen eyes to make all typos and spam comments shallow would be very nice.
A more technical note on Blogger’s implementation of WYSIWYG editing in the browser
Despite having no real interest in using a WYSIWYG editor, I’m always interested in seeing how people do them.
XML on the Web has Failed
Well, like so many things, it’s failed miserably to fulfill its wild promises, but it still sometimes sort of works, and every so often we notice some broken parts, and fix a few of them. Next time? Next time something will promise even more (certainly including “the tools will save us”), and maybe as a result of delivering on the same small percentage of its promises, will deliver more.
XHTML Frequently Answered Questions
I keep hoping that the answer to “why XHTML?” will have a real reason, for people who aren’t doing MathML or SVG. Someday, maybe. Or not.
User Authentication on the World Wide Web
The basics of cookies and alternate methods of authenticating users, from Tom Pike.
Not dead yet
Jim says his Whole Wheat Radio blog isn’t dead, just resting while he’s distracted by things like the Wheat Hole House Concert building and the whole novelty of having more than a few minutes of sunlight. “Maybe I’ll make blog posts more frequently during the dark winter months when things slow down.” More than anything else, that’s what I like about RSS. It will cost me absolutely nothing, and Bloglines and Jim nearly nothing, for me to patiently wait until he feels like saying something through his blog, instead of somewhere else, again. Checking a bookmark, or remembering to visit the web page some other way? I’d probably forget all about it long before winter sets in.
LibDB
In-ter-esting. Morbus is working on a Drupal module to catalog your movies, books, comics, whathaveyou, according to FRBR (in PDF, but it’s still one of the best ideas to come out of library cataloging in years), using RDF, and aiming to be usable even if you not only don’t know what either one is, you very strongly don’t want to ever know; all you want is to know that yes, you do own The River Why, and further that you own a copy of the 1983 Sierra Club hardcover and a copy of the 1984 Bantam Windstone paperback (spine broken twice, top shelf in the guest bathroom).
Wishlist notifier
Found via the presence of Ed Summers in the LibDB wiki, Wishlist is a Perl script you call from cron, which checks your Amazon wishlist for things that either either heavily discounted (50% off list, by default) or cheap (less than $5, by default), and emails you when it finds them. Sweet and simple: the code is very nearly shorter than the documentation.
Script-killer comments
Oddly enough, one of the last questions I answered in the MT forums before I went offline was about exactly this: if you insist on putting your Javascript in HTML comments, which blows up in XHTML and was intended to keep it from being displayed in, what?, IE 2.0?, then you have to be sure you have a newline after the opening comment tag, or you don’t have any Javascript at all.
Table rows…revealed!
Scott Andrew on how to hide and reveal table rows with Javascript and CSS. I wouldn’t have thought to set display to the empty string to get back to the default behavior, no matter what various browsers think that default behavior is. Nice!
XML-Feed-0.02
Interesting. What’s Ben up to, that he needed to write a way to access the results of parsing either RSS (through XML::RSS) or Atom (through XML::Atom) without worrying about which format it was?
RSS Scaling Issues
Mark Fletcher is interested in suggestions on how Bloglines can help syndicated feeds scale. I know what I want, but I’m not sure I’ll ever get it: if Atom ends up in a form where a feed can include items from multiple other feeds, with the feed-level data intact, then Bloglines could provide both their browser-based frontend, and also an Atom feed of all your unread items that you could access from any desktop reader that understands Atom. A little fiddling with preferences and an extension element to say whether an item is read or not, and you could read through the browser on the road or at work, then download the items for local storage or search or whatever later, without having to see them again.
JRoller and SharpReader
The pain of doing things right when something goes wrong: SharpReader properly supports Last-Modified headers, so when JRoller returned them with dates in 2028, that left SharpReader poised to wait 24 years before getting updates again. Same sort of thing goes for other HTTP guff: properly support 410 Gone by immediately removing the subscription, and any time someone drops a Redirect gone / in the wrong .htaccess, you mistakenly unsubscribe. HTTP is hard.
DOM scripting book
Stuart Langridge on modern Javascript, the DOM, and unobtrusive DHTML. I am so looking forward to this.
Global worming
Google VP of Operations Urs Hoelzle on the perhaps more widely reported than felt DoS of Google by MyDoom.O: “A very small percentage of our users and networks–most notably, a few media outlets that write about us–were heavily infected with MyDoom, so our systems temporarily blocked their queries.” Heh.

11 Comments

Comment by Tim #
2004-08-01 22:09:01

Welcome back. I hope the fishing was good. You have just dumped a welcome load of homework on my plate. I’ve missed it. :-)

 
Comment by demonsurfer #
2004-08-01 22:57:24

”..it looks like they [google] already are working on it.”
Gawd I hope so!

 
Comment by Mark Paschal #
2004-08-02 10:22:57

”it’s going to take pretty serious hacking for anyone but 6A to do it”

Do you know a specific reason why that is? I would come at it from the plugin architecture anyway (save revisions in a post_save callback, roll back with a plugin action). So, yeah, it’s not a quick hack, but it wouldn’t be a quick hack for 6A either. :)

Comment by Phil Ringnalda #
2004-08-02 11:21:30

My concern was permissions, without having ever looked too closely at it. The perms I think I’d want would be edit all non-draft posts, but not delete them or change the status, edit or moderate all comments and pings (what’s the status of ping-moderation? ack?), rebuild, nothing more. Trying to create an author like that, I see you can have ”Edit All Posts” without ”Post”, but the result for 3.0D is main menu links to create/edit/edit comments, but no perms to do any of them (though the per-blog menu allows access to the 5 most recent entries, and then the Previous link allows stepping back through the entries). So now I have to upgrade to 3.01 to see whether that’s fixed or in need of a bug report.

But adding ”Post” to get working perms gives the monkeys access to drafts, which I sometimes use for flames that will either be quenched before publishing, or just never published, and to Delete Entry, which could also be caught by a callback, but not in a pretty way: unless/until callbacks can affect the app, I wouldn’t guess you could change ”delete” to ”change to my special sort of draft so I don’t lose the entry ID if I didn’t want it deleted.”

So, on second thought, it’s completely doable outside 6A, it just requires the sort of ”plugin” that I’m not used to thinking about yet: a separate app that does all of its own UI, along with MT plugins to affect what happens when you call MT methods.

Comment by Phil Ringnalda #
2004-08-03 08:48:04

Eh, same behavior, but now that I think about it, I’m remembering a discussion, Mantis or forum, where giving ”Edit All Posts” without ”Post” was declared stupid and you deserve whatever you get. I don’t agree, but I’ll skip the bug report.

 
 
 
Comment by Dorothea Salo #
2004-08-03 07:04:15

Problem isn’t escaping. Problem is that HTML named entities are not defined in an RSS DTD, because there is no RSS DTD. Ergo, said entities invalidate and un-well-form the doc.

Comment by Phil Ringnalda #
2004-08-03 08:27:42

Ah, yes, one of the big hazards of not escaping: if you try to be as helpful to the world as possible, do your Atom as inline XML, then you have to type entries with NCRs, or convert entities to the appropriate codepoint for your encoding before you publish, or something (something probably not including just declaring a minimal DTD that includes the entities from HTML, which does work, mostly, but with possibly unexpected results on the parsing end of things).

But what surprises me is that WP isn’t just escaping any ampersand it sees heading into anywhere in any feed. That’s why I’m curious about (when your paws get better and every keystroke isn’t another yelp, if you remember then) which piece of data (title, body, something I’m not thinking of) going into which element in which feed(s)?

 
 
Comment by Dorothea Salo #
2004-08-03 12:22:03

I’m not sure which part threw the error; there were entities in the post title and in the post body. AFAIK the people who pointed out the error were looking at my Atom feed.

Comment by Phil Ringnalda #
2004-08-03 20:49:20

Okay, I’m with you now: title in Atom and RSS, summary in Atom, still there (I’m not actually smart enough to look first). Along with another problem for atom:summary, where PHP’s throwing in a Undefined variable: excerpt in /..../wp-includes/template-functions-post.php on line 89 notice. (Whoa, running PHP with E_ALL error level on a production server? You are a bold one, aren’t you? :))

The lazy get-it-done approach would just be to escape all three things, throw a CDATA around them in wp-atom.php and wp-rss2.php, and add type="text/html" mode="escaped" attributes in the Atom. But where’s the fun in that?

The right thing to do is to do a filter that decodes entities, at least within the limited set of charsets that PHP admits to knowing (which at least does include UTF-8), but I’ve been out of the WP code for six months now, so it’ll take me a little while to figure out how it all works again.

Meantime, the Undefined variable notice: after line 79 (if ($cut) {) insert a new line, $excerpt = ''; and you’ll be set for that. (Or turn down the errorlevel: even though it’s the right thing to do, I’d be amazed if that’s the only Notice popping up.)

 
 
Comment by Steph #
2004-08-09 16:16:11

I just came back from a few days in the mountains. My aggregator is still young, so I only had one hundred and something unread items. Took me a better part of the morning to go through, though. I didn’t do a linkdump post, I just delicious’ed them.

 
Comment by Dan #
2004-08-17 13:00:49

Not to start any sort of war, but a pointer to something you might not have known of:

”…whereas Perl, with its 260 functions, jolly well better be easy to memorize, because if you forget how to do something you’ll be googling for a comprehensible answer for hours.”

try either ”perldoc -f somefunction” (if you’re on some unix), or perldoc.com. Full docs, including FAQ questions, exmaples, and tutorials.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.