net stuff

Collaborative content and social awkwardness

I check in with Wikipedia every day, and try to be a good steward of the articles I’ve contributed to.

Wikipedia is a funny thing. It’s easy for two people of good intent to have very different ideas of what’s appropriate content, and to get into a fight over what belongs and what doesn’t. In situations like this, the “right” thing to do isn’t very clear-cut. In some situations though, there is a clear right and wrong. Commercial links and self-links are explicitly discouraged.

So it felt a little awkward for me today when I discovered someone I kinda-sorta know had inserted links to his own site in many articles. This is a little like cussing in church.

I reverted all his insertions.

From the department of really bad ideas

I’ve seen a sudden upsurge in a particular kind of spam over the past day or so. All of them come with a (Windows) executable attachment.

Several of the messages read as follows:

From:     Admin@cia.gov
Subject:  Your IP was logged
Date:     21 November 2005 22:07:31 CST
To:       [my e-mail address]

Dear Sir/Madam,

we have logged your IP-address on more than 30 illegal Websites.

Important:
Please answer our questions!
The list of questions are attached.

Yours faithfully,
Steven Allison

++++ Central Intelligence Agency -CIA-
++++ Office of Public Affairs
++++ Washington, D.C. 20505

++++ phone: (703) 482-0623
++++ 7:00 a.m. to 5:00 p.m., US Eastern time

Call me crazy, but it seems like a really, really bad idea to use the CIA—the same organization known to torture prisoners—as the Joe in your little joe-job phishing expedition.

later: Apparently I’m not the only one getting these.

Google automats the one-line bio

I was trying out the new Yagoohoogle and of course, had to search on my name to do a double-barreled egosurf. The first result from Google not only pinpoints me (as opposed to the other Adam Rices out there), it cobbles together a one-line synopsis of who I am and what’s going on at my site.

Freelance Japanese-English translator living in Hyde Park. Includes a weblog, recipes, trip diaries, and rants.

This sentence doesn’t appear anywhere on my site. Fragments of it do. I tried typing in the names of some friends who also have websites and distinctive names, but didn’t come up with anything equivalent for them. I wonder where this came from. I know that Google News has some kind of magical news-story synopsizer–I wonder if they’re starting to apply that technology elsewhere. It’s obviously not perfect–although I do have a few recipes posted on this site, they’re hardly as prominent as other kinds of writing. And rants? Moi?

Later: I think I found the source of that bio. Dmoz. Should have guessed. Presumably written by a human, though it isn’t clear who the category editor is. Some other Google results for fellow bloggers seem to be culled from this listing, which could do with some editing. My name given as “Adam Rice,” David Nuñez’ as “Nuñez, David,” and many other people listed under the title of their blog, rather than their name (and no, I am not volunteering to edit this).

A new phishing exploit

I’ve run across a new phishing exploit–new to me, anyhow. This one is especially pernicious because it actually uses a legitimate bank’s website against itself.

Take a good look at the following URL:

http://www.charterone.com/ legalcenter/do_not_solicit_confirm.asp? name=%3Ciframe+ style%3D%22top%3A120%3B+ left%3A0%3B+ position%3Aabsolute%3B%22+ FRAMEBORDER%3D%220%22+ BORDER%3D%220%22+ width%3D900+ height%3D650+ src%3D%22http%3A%2F%2Fwww.totallyfreebanking.biz%22%

I’ve broken it up and highlighted the salient portions in red. I’ll break down what is happening here. Apparently, Charter One uses (or used–see below) a frame-based interface where the contents of a frame could be specified through the URL. What the scammer has done is set up a mimic site (www.totallyfreebanking.biz) that looks like Charter One’s, and loads in a frame of Charter One’s, but isn’t a part of it, and send the phished data back to the scammer. So even a person who is generally aware of phishing scams might look at this URL and say “Oh, it really is from my bank, it has their URL, it must be OK.”

I visited the page in question, and Charter One seems to have defeated this already.

I don’t want this to be a “frames are bad” rant, because I do think frames have their uses. And in fact, using URLs to specify frame contents goes a long way towards addressing the problem of frame-addressability. But anyone who can’t afford to have an outside party insert content into a frame needs something more subtle–perhaps a javascript detector in the framing page to prevent outside pages appearing in a frame.

Getting with the program

Del.icio.us is a “social bookmarks manager,” or in plain English, a web page that lets you keep a list of interesting websites. What makes it interesting is that it lets you use tags to classify your links a rough-and-ready sort of way (this kind of undisciplined tagging is now sometimes called “folksonomy”), lets you see links from other people with the same tags (or any tags) and shows you how many other people link to a given URL.

I’ve been keeping a “hit and run” blog for some time, and this fulfills the same role for me as del.icio.us would, but I had been unwilling to switch over two del.icio.us for a couple of reasons: 1. The data doesn’t live on my machine; 2. It’s not easy to control the presentation–it is possible to republish your del.icio.us links on your own page, but you’re kind of stuck in terms of presentation. There are ways to get at the data programmatically, but that involves programming, and that means work, and I’m lazy.

But I finally decided to sit down and figure it out (as a way to avoid something even harder: my current translation job). Somebody has already provided a library of PHP tools for messing with del.icio.us, and I know just enough about PHP to get myself in trouble. Here’s what I did [caution: entering geek mode]

Complete this phrase

So everyone is talking about Google’s new “suggest a phrase” feature, which is almost psychic. But it’s also fun for language buffs–it gives you a cheap and easy way to see fixed phrases in action. Here’s an obvious example.

sharks with...

Writing tools for the web

We’ve come so far, and yet, we have so much farther to go.

When I set up my first website, all my HTML coding was manual. Creating a new page meant opening up a copy of an empty web-page template I had created, writing the page, saving it, uploading it by FTP, editing my local copies of any other pages I wanted to link to it, saving them, and uploading them (and any graphics that might be involved). Using Japanese at all was fraught with peril, because there wasn’t a single browser that could handle the three prevailing encoding methods for Japanese (Shift-JIS, New JIS, EUC). There were some primitive web tools that allowed you to edit pages directly on the server via the web browser, but the browser provided a miserable interface, and I didn’t have the cash or the technical chops to install one of these.

Today, we’ve got content-management systems and blogs that handle most of these tasks. Modern browsers have tolerable editing interfaces and are vastly smarter about international characters (and in any case, we’ve got Unicode). Right now I’m typing this in a small program that runs on my computer and talks to my blog’s software to upload my posts.

But some people want more. They want an editing tool that gives them smooth control over HTML and CSS, and somehow gets out of their way to make this all transparent. Something like Microsoft Word for the web. Well, that’s a poor analogy, because Word is terribly intrusive. Like how it goes and creates bulleted lists for you when maybe you don’t want one (unless you go to considerable trouble to get Word to cut it out). Interestingly this kind of thing is considered a useful feature in blogging tools, which may say something about the state of editing tools, or the difficulty of writing HTML. I’m not sure.

Back in the bad old days of personal computing, word-processing for paper output was about as bad as writing HTML is today: you had to type tags or special codes to make text bold, italicized, or centered. With the exception of some hardcore XyWrite enthusiasts, WYSIWYG was acclaimed a great leap forward for word-processors.

So great has been WYSIWYG’s hold on our imagination that it is promoted as a worthy format for web-writing tools. It ain’t. Although my handy-dandy blogging tool will do a tolerable job of wrapping <p> tags around my paragraphs, it can’t do much more than that, so there’s clearly room for improvement. But how do you deal with hyperlinks wizzywiggily? Or the clever abbreviation above, that shows its full expansion when you hover over it (in a decent browser)? There’s no way. HTML is not presentational, it’s structural. You need a way to show the structure of the document as you write. For presentation, we’ve got CSS, which determines the presentation of content for different media: screen, paper, even speech-synthesis. In the land of print, you might have a rule that document titles are always 24-point bold and centered. In the land of HTML, all you know is that a chunk of text tagged as <h1> is at the top of the document hierarchy (and there’s room for dispute on that). That same chunk of text could have completely different forms of presentation because HTML is not tied to any form of output; it contains different forms of meta-data that will often be presented identically (but might be useful to search engines), and contains some that simply isn’t meant to be viewed by humans.

So we’ve got to deal with HTML, content, and CSS. I can imagine a writing program that lays things out like this:

Tag Content CSS
h1 Here is a heading Screen
font-family: helvetica;
color: #600;
font-size: 18px;
Print
font-family: helvetica;
text-align: center;
font-size: 18pt;
p Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent consequat, nibh at aliquam convallis, sem arcu sodales mauris, et sagittis risus wisi ut dolor.. default

Perhaps there would be popup menus in the Tag and CSS columns, or perhaps some intelligent prediction that writers can override (when we get intelligent prediction in the Content column, watch out).

Of course, even this is still too simplistic. In terms of HTML handling, this model is OK for block elements, but does nothing for inline elements. Dealing with nested elements could be tricky. For CSS, it gets very hairy. CSS is complicated: styles can reside in the tag, in the head of the document, and in other documents, all pulled together in different ways. Styles may apply to a tag universally or contextually. So this putative tool would need to analyze the document structure to determine which contextual CSS rules to show. And if the author serves pages dynamically , or uses dynamic includes to assemble a page from multiple fragments (as I do), it will need to take that into account.

I don’t see an easy way out. A program that is easy to use and allows fine-grained control and generates smart, well-formed HTML and CSS will be a serious challenge for interface designers and coders.

Pseudo-consensual link-farming

This, I think, is a new one.

I have received a piece of spam (sent via an insecure host in Ukraine) that appears to be an innocuous request by one blogger to exchange links with other blogs. The problem is that the sender is nobody I’ve ever heard of, and the blogs aren’t particularly on the same subject as stuff I write about (if anyone figures out what subject I’m writing about, please let me know).

The blogs in question appear to be legit blogspot-hosted blogs–and they have on-topic content–except that the sidebars are obvious link-farms, as is the website associated with the sender’s e-mail domain.

So what we have here is an attempt to get people to act as part of an unpaid link-farm. More polite than comment-spamming, I guess.

Ten years on the web

Macworld San Francisco begins today. I am sure there will be some interesting announcements that send the Mac cognoscenti a-nattering. But for me, it’s an occasion to think back.

I attended Macworld SF in 1994, staying with a friend from my days in Japan, Robin Nakamura, who attended as well and was also a bit of a Mac geek. It was fun. The big thing was CD-based entertainment, like The Journeyman Project. The hottest Mac you could buy was a Quadra 840av, and I remember watching a demo of an amazing image-editing app called Live Picture, which looked set to beat the pants off of Photoshop at the time.

On the plane ride back, I was reading a copy of Macweek that had been handed out at the show, and got to talking with a guy in nearby seat, Greg Hiner. Turns out he worked at UT developing electronic course material; he invited me to drop by his office to check out this new thing on the Internet called the World Wide Web. I had an Internet account at that time, and was acquainted with FTP, Gopher, and WAIS, but hadn’t heard of this Web thing.

So a few days later, I stopped by his office, and we huddled around his screen as he launched Mosaic. It immediately took us to what was the default home page at the time, on a server at CERN, in Switzerland. I noticed the “.ch” address of the server in the status bar and said excitedly, “we’re going to Switzerland!” A gray page with formatted text and some pictures loaded. This was cool. This was not anonymous, monospaced text, like you get with Gopher. He clicked on some blue text that took us to Harvard, I think, and I commented “now Boston!” This was exciting. This was big, and I knew it was going to be really, really big.

I’ve still got a few of the earliest e-mails we exchanged, in which we traded links, and I am tickled to see that (at least through redirects) some of those sites are still live (see: mkzdk, John Jacobsen Artworks).

I quickly figured out how to write HTML and put up a web page to serve as a resource for my fellow Japanese-English translators, who I knew would want to latch onto this Web thing and just needed something to help them get started (ironically, the page is too old to be included at the Internet Archive).

And here we are today. I am writing this in a program that runs on my computer, and communicates over a (relatively) high-speed connection with a program that runs on my server to create and manage web pages. Many of my friends do the same, and I’ve made new friends just because of this simple activity. The boundary between one computer and another, between my hard drive and the Internet, is, if not blurry, at least somewhat arbitrary. I’m watching Steve Jobs’ Macworld keynote in a window in the background as I type. Things have changed a lot. And I feel like we’ve barely gotten started.

Table layout for non-tables

CSS is endless fun for the geek–it can be perverted in so many amusing ways. Take table layout, for example.

Back in the old days (you know, like 1998), HTML authors used table tags to lay out web pages. Gradually, a certain sub-community of web developers came to criticize this: HTML is meant to describe the page’s structure, not appearance, and tables were being used to lay out text matter, not tabular matter. “Save tables for, you know, spreadsheets.” they said. “Look, we’ve got this lovely thing called CSS that provides all kinds of layout flexibility.”

Many old-school web developers have been uncomfortable with this. Table tagging is familiar and predictable; CSS uses a completely different model for laying out the page. Or does it?

The fact is that CSS provides a complete set of tools for styling tables. It even lets you use tabular display tricks for text matter. So you can have your nice, semantically correct HTML, and in the CSS twist it to be displayed exactly as if you had marked it up with table tags.

Communications vs Telephony

There’s an interesting case brewing right now between voice-over-IP (VoIP) services that provide something like telephony without necessarily using phone service, and state regulators that want to tax these services.

There’s a fundamental old-world/new-world divide here.

In the old world, if you wanted to communicate, you got a phone and talked with people. In the new world, if you want to communicate, you can get some form of Internet access–which could be over a plain-old phone line, a DSL line (which almost invariably comes with phone service attached), cable modem, or the wifi signal at your neighborhood coffee shop (if you want to get exotic, there are more options)–and then you use some kind of communications service (AKA the application layer)–email, ICQ, web-based forums, and now, VoIP. So where the service and the access used to be tied together and inherent in the technology, today, voice is just another service on a layer that is more or less independent, on top of the medium transporting it.

The old-world regulatory regime can’t keep up with that, so it needs to change. The proposed taxes on VoIP are already somewhat arbitrary in that they really don’t cover all VoIP applications. Anyone can download a video chat program (like iChat AV). This gives service that’s an awful lot like the services that regulators want to tax, but is completely outside their control. Regulators are only concerned with services that act like general-purpose telephony, and can interact with the public phone network. In the short term, one might argue that it’s OK to treat services that act as gateways between the traditional phone network and the Internet as telephony providers; in the long term, that won’t work, because more and more communications will move onto the Internet.

Some of those taxes are specifically for the common good–the charge for 911 service, taxes to subsidize phones for poor people and provide Internet access to libraries. Others just go into the pot. But let’s assume that they’re all necessary. How would they get divvied up under a new-world regulatory regime? By taxing the VoIP at the application layer? This is a huge can of worms that I would hate to open up, as it would mandate spyware on your computer to keep track of whether you use it for voice services. This would be even worse than the broadcast flag. Taxing the physical layer? This strikes me as closer to what we have now, and less problematic in some ways, but moreso in others. Open wifi nodes are already prevalent, and are becoming moreso. In fact, some cities are installing them in public places for public access, making it easy for people nearby to get a free ride. This is for the good, but if the node’s connection carries all the tax, it will tend to increase the number of free riders and decrease the number of nodes, which is bad.

I really don’t have the answers to this, but it’s an interesting question. One thing I am sure of is that we need to recognize the application/connection separation and allow VoIP to grow.

Tim Bray on spam

Tim Bray comes up with a plan for spam that is similar to my previous idea–paying to send e-mail–but doesn’t require any architectural changes to the Internet.

His idea can be taken a step further: once you’ve established friendly communications with someone, you could set up your mail filters to accept unpaid e-mail from that person.

Let’s beat on Verisign

Chip has been doing a good job of beating the drum on Verisign’s offensive “typojacking” (great word) of all unassigned domain names on the Internet. Meaning, for example, if you accidentally type “corssroads.net” into your browser, you are taken to a Verisign page that tell you “perhaps you meant one of these pages.” Prima facie, that actually sounds helpful, but there are serious problems with it. The architecture of the Internet depends on the ability to check whether a domain name is valid or not. This trick stymies that ability. It’s also sleazy, because Verisign can monetize your typos: rather than pointing you to the most likely correct spelling, they can suggest you visit sponsoring sites that seem like likely hits.

And finally, simply by visiting Verisign’s website, you are agreeing to their terms of service. There may have been a tattered fig leaf of respectability for that stunt when you had to intentionally go to their site, but that fig leaf is completely gone now. One harassment tactic that geeks could take would be to write them, saying “I came to your site completely by accident, and I do not agree to your terms of service. Please make it so that I can no longer accidentally violate your TOS.”

In fact, now that I think about it, I wonder if we can write up a sort of “reverse-TOS”–that is, we could file a TOS (hidden in a locked filing cabinet stuck in a disused lavatory with a sign on the door saying Beware of the Leopard) reading something like “responding to any HTTP GET or POST requests originating from my computer constitutes acceptance of these terms of service,” which might include terms like free ice cream delivered daily for the next year.

Spam report

I haven’t been monitoring the amount of spam that gets nailed by spamassassin (it’s a lot), but in the past two weeks, 114 pieces of spam have slipped past it. Of those, I find it amusing that 10 are offering anti-spam software.

Yet another social network

Friendsurfer. How many of these things are there–excluding the numerous parody sites?

This one caught my eye because it shows a fire-twirler in its banner graphic. Friendster has been notably popular among my fire-freak friends–I wonder if there’s any connection.

Fighting back at spam

Paul Graham suggests that when your spam filter identifies a message as probable spam that it automatically ping any URLs mentioned in the message–perhaps repeatedly–to drive up the spammer’s web-hosting bandwidth costs. If lots of people do it, spam suddenly gets much more expensive to send. I like this–it fights fire with fire.

Speedy service

I posted a lazyweb request concerning the creation of foaf files a couple days ago. One of the guys who appears to be the prime movers in the world of foaf, Jim Ley, whipped together a little converter that takes tabbed data and spits out the “knows” portion of a foaf file. You’ll probably want to use the foafamatic to create a shell foaf file with your own data and maybe one friend , then export your contact data from whatever dark cave you store it in into tabular form and run it through Jim’s widget, and finally merge the two. A little clunky, yes, but a big improvement over what we had before.

Once your done, check out your handiwork in the foaf explorer.

Lazyweb: a better foaf-file maker

Making good FOAF files is a pain. There’s the foaf-a-matic web page, that lets you type stuff in and formats it for you, but who wants to re-type all that data? There’s a Java-based successor, but that’s overkill, and still doesn’t have any way to import data, as far as I can tell.

Thanks to Address Book Exporter, it’s easy for me to extract data from my address book in a tabular form. I suspect that most people interested in using FOAF probably already have their data typed in somewhere, and would be able to extract it in this form if needed. From there, it would be pretty easy to use GREP to mark up the file into a usable FOAF file, except for the sha1 e-mail address encoding (which is not required, but is the responsible thing to do). And not everyone would be comfortable writing a GREP pattern, for that matter.

What we need is a little web utility that ingests tab-delimited text files and spits out FOAF files. I know it can be done–it’s just out of my reach.

Spam in my name and challenge-response

I recently discovered that some spam was being sent with my address as the return address–the bounces were coming to me.

Other than pissing me off, I wasn’t sure what those smegma-sucking spamming scumbags hoped to accomplish by doing this. Now I have an idea: it may be to undermine challenge-response spam-blocking systems.

These challenge-response systems are a klunky way of dealing with spam: if Alice sends Bob an e-mail, and she’s not on Bob’s whitelist, the system sends Alice an automated response asking her to visit a web page and prove that she’s a real human being worthy of Bob’s valuable attention. This usually involves looking at a graphic showing distorted text, and typing the text into a box.

Even if this all works according to plan (and there are plenty of reasons why it might not), it’s very annoying. But as soon as spammers start sending out e-mail purporting to come from real people, it really goes to hell:

  • If I am already whitelisted with a C/R service, the spam gets a free pass.
  • If I am not already whitelisted with a C/R service, the challenge comes to me. Maybe I’ll respond correctly, in which case the spam gets a free pass
  • Or maybe I won’t respond, or (acting mischievously or perversely) respond incorrectly, in which case the spam is blocked, but so is any e-mail I might want to send to any person using that system in the future.