February 21, 2003

Blob blog

How do you like the new look? Thanks to the wonders of CSS, most of the work was in making the oval graphics, and that didn’t take long. That and tweaking a few CSS settings, but I was pretty much able to do this in my spare time over one late morning.

I’m guilty of one act of backsliding: the title logo is now text-as-graphic. Getting the effect I wanted using text-as-text would be terribly painstaking (it still isn’t exactly right), and would probably break in a bunch of browsers I don’t have access to. When CSS3 support is available, I’ll revert to text-as-text, promise.

Is Google too big?

Google’s recent buyout of Pyra set the whole blogosphere abuzz, but it also seems to have prodded some people to wonder whether we should worry about Google being too important, too big, too valuable, too secretive.

At Austin’s blogger meetup the other night, Prentiss asserted that private projects like Google and archive.org were too important to leave in private hands (archive.org is basically a hobby of Brewster Kale’s). He suggested that the Library of Congress should be given funding to develop and maintain resources equivalent to these.

Citing privacy concerns, the BBC’s Bill Thompson suggests that Google is “a public utility that must be regulated in the public interest,” and that the British Government should establish an “Office of Search Engines” (or to use his Orwellian term, OfSearch).

Both points have some merit, although both have weaknesses. Regulating a search engine strikes me as a potentially heavy-handed. And if privacy is an issue, I’d be especially unwilling to see the U.S. Government in its current form operating a popular and all-encompassing search engine–that could easily be a back-door to Poindexter’s Total Information Awareness.

So what’s the solution? I’m not sure. But I think that if Google (or to be exact, the services it offers) is too important to leave to Google, it’s too important to leave to any one entity. Better to seed the technology widely. The open-source community might be able to come to the rescue, if it could develop and disseminate smart search-engine code, and license it under strict terms that permitted a nonprofit organization to inspect the books at licensees to make sure they weren’t misusing data they captured, etc. Result-rigging could be caught be setting up a meta-search engine that compared results from different installations of the same engine.

[Later] So how do you come up with a good search engine? Obviously part of the problem is having the bandwidth to crawl the Net frequently and thoroughly. Part of it no doubt comes down to efficient indexing. But perhaps the trickiest is results ranking. I was speculating on ways to refine the matching algorithm, and perhaps a tournament approach would be the way to go.

Here’s what I mean: Develop a bunch of matching algorithms. By default, site users would just see whichever is the preferred algorithm du jour. But willing users could see a “tournament view” where results from two different engines were presented side-by-side. They could then express their preference as to which set of results seemed most useful. With N algorithms, there would be N2-N possible tournament combinations. With a large user base, it shouldn’t be hard to generate meaningful results. This could also be part of the feedback loop in a genetic-algorithm approach, although I don’t understand genetic algorithms well enough to really develop that angle any further.