Cover coincidence

Saw these two covers (of Blur and Boards of Canada albums) side-by-side at Waterloo records.

X2

Saw X2, the X-Men sequel (I should probably say “the first of many X-Men sequels”). I was reasonably entertained by it. Drew liked it better than me.

X2 was much better than the first X-Men movie in terms of story, action, and characters: the story’s landscape of light and dark is interesting: the X-Men are the good-guy mutants, Magneto and his gang are the bad-guy mutants, and then there are the regular humans. But the line between good-guy mutant and bad-guy mutant is blurry: Magneto has a complicated relationship with Professor X, and genuinely doesn’t want to hurt him. Wolverine, a good guy, has no qualms about eviscerating anyone who threatens him. All the mutants were more sympathetic than many of the mundane humans, who either feared the mutants or sought to enslave them. The action and eye-candy were fast-moving and epic in scale–real big-screen material. The first X-Men movie portrayed the characters as embarrassingly incompetent in a fight. Not this one. And while many of the characters were wooden in both movies (notably Cyclops, who makes Al Gore seem as wacky as Al Yankovic), it was nice seeing Mystique’s character get fleshed out a little. Casting Alan Cummings as Nightcrawler was perfect, and what can you say about Ian McKellen? He’s great. Classes up the joint, too.

I felt the ending was extremely contrived and unsatisfying. Drew thinks its a setup for the next sequel.

Wisdom teeth out

In a ten-minute procedure this morning, I was relieved of my three wisdom teeth and about $1300. The procedure was not painful, but it was unpleasant: I was very anxious through the whole thing, and apparently was ashen by the end of it, as the doctor was concerned about me and wouldn’t let me get up until my color returned. I did this under a local anesthetic, which was supposed to last for about four hours. Right now, about three hours have passed and it’s wearing off (I’ve already taken a happy-pill, but it’s not doing much good yet). One of the extraction sites is still bleeding and that whole side of the mouth hurts, even though it’s also peculiarly numb. Also peculiar is how perfectly the numbness bifurcates my mouth. My bite feels very strange–I wonder if my teeth are re-aligning themselves or if this is an artifact of the swelling and the fact that I had gauze in my mouth for hours.

Spam report

Over the past eight days, I have received 397 pieces of spam. 328 were flagged by Spamassassin and dropped in my spam-box before I ever saw them; one of these was arguably not spam (it was bulk, commercial e-mail that I didn’t particularly want, but I have bought stuff from the sender before, so they had obtained my e-mail address legitimately). Only about ten messages had subject lines that might fool me into thinking they weren’t spam.

I don’t have exact numbers, but spam accounted for well over half the total e-mail I received in this period–possibly over three-quarters.

Social networks

There’s been a lot of interest lately in social software. A related phenomenon is the way the Internet can make social networks explicit.

I like playing around with this. I recently created a FOAF file (see my badge-zone). And there’s a brilliant “FOAF explorer” (where you can see I really need to flesh mine out).

One problem with FOAF is that it’s nerdy, and while I think it’s a good approach, not everyone will bother putting FOAF files on their websites (oh wait–not everyone even has a website). Friendster answers that–it approximates FOAF’s functionality, but lets the user sign in and point to friends rather than post a file with arcane formatting. It would be nifty if Friendster could read FOAF files, and conversely, if Friendster had an interface for feeding information into FOAF files.

None of this is particularly new. Six degrees did roughly the same thing as Friendster back in 1995, I think. But the Internet is big enough that network effects make the idea more viable. It’s also interesting trolling through Friendster–so far, the only friends I’ve found in there are part of my fire-freak circle of friends, so all the same faces keep popping up. It would be interesting to find someone from a different circle there and be the point of intersection between circles.

Later: Seems that Ben Hammersly had the same idea.

Spider

Saw Spider last night. Interesting movie. It’s by David Cronenberg, and I’ll pretty much see anything from him on spec. Some parents had brought their kids (perhaps expecting Spiderman–children should never be brought to Cronenberg movies).

The movie, like its protagonist, moves very, very slowly. A madman sent to a halfway house in his hometown gradually recollects (and partly re-invents) his childhood, and the events that caused his madness, or were precipitated by it and exacerbated it–the movie is not clear which. The storytelling was very affectless–I don’t quite feel as if I got inside the character’s head–but is very atmospheric. Ralph Fiennes did an excellent job in what I’m guessing must have been a very difficult portrayal of the title role.

Natural keywords and categories

Adam Kalsey has done some fine work on creating lists of related entries for Movable Type based on the contents of your blogs.

Not to undermine it, but this still doesn’t go far enough towards discovering natural relations between entries, and won’t work unless we write in a restricted style with a restricted vocabulary–that goes against the grain of blogging, which is personal and spontaneous. If I mention Donald Rumsfeld in one blog entry and the Secretary of Defense in another, clearly they’re related (although the person with that title can change, making that equation more complicated). How can this be made to work?

The first problem is extracting potential keywords from “noise” words. A first-order effort would be to have a canned list of noise words, and filter those out–this would be a simple, fast process. A second-order effort would be to filter out any words that are used very frequently by the blogger–this would be much slower, and perhaps should be handled asynchronously (the results of this could be used to refine the first-order noise-word list to speed things up in general).

The two Big Bens of Blogistan (Trott and Hammersley) have worked out the ingenious more like this from others. This has the germ of something interesting: using an outside reference.

Something like the Open Directory already represents a pretty extensive hierarchical library of keywords. To take my prior example, the first hit for a search on “Donald Rumsfeld” at dmoz is found in the category “Regional > North America > United States > Government > Executive Branch > Departments > Defense”. That gives you some excellent keywords to take home. (It also seems possible that if a candidate keyword generates scattered search results, it might not be a good keyword, and should be added to the noise-word list.) The most specific are at the end, and “Defense” is a very useful keyword to equate to Rumsfeld. It gets better: that category contains subcategories with very useful terms (Armed Forces, Defense Agencies, Department of Defense Field Activities, Intelligence, Joint Chiefs of Staff, Office of the Secretary of Defense, Unified Combatant Commands) as well as related categories (Science > Technology > Military Science; Regional > North America > United States > Government > Military > Installations > Pentagon). These could be used to generate a high-quality list of “alternative keywords.”

So the process of finding and using alternate keywords would go something like this:

  1. Create potential keyword list
    1. Winnow out noise words
    2. Winnow out other frequently-used words
  2. Search dmoz or other directory for keywords
  3. Collect categories for search results, as well as subcategories and related categories
  4. Assemble new list of alternative keywords
  5. Search blog corpus for alternate keywords, create links when found

The process of constructing a list of alternative keywords clearly involves a fair amount of work–but that’s what we’ve got computers for. And it obviously won’t always be perfect–but that’s what we’ve got brains for.

Scroll to Top