09 Jun 2006 – Adam Rice

Robotic folksonomy

09 Jun 2006 / technology

I recently bought a new digicam, and I’ve been working on a translation job that relates to signal processing. These two facts, shaken together with some loose synapses in my brain, got me thinking along the following lines.

Digital cameras these days, in addition to taking better pictures, have better processors, and some have interesting ancillary functions. Kodak, for instance, used a general-purpose operating system in some of its cameras that can run user-supplied software. Inevitably, someone adapted this to play video games on the camera’s screen, but this was used in other clever ways (to take a picture every five minutes and upload to a connected computer, say).

We’re also starting to see digicams with wifi connectionsâ€”in theory, if you’re near a hotspot, you could put your pictures online as quickly as you shoot them. We may also see cameras with Bluetooth that could get online via a cellphone connection.

But what a mess that would be to manage, a constant stream of unnamed, untagged photos. Since I started using Flickr, I’ve found that tags are often more useful than titles for photos. But who wants to try to apply tags via your camera’s interface? What a pain. That got me thinking about robotagging.

Imagine you have a digicam of the not-too-distant future that can talk to Flickr (which I’ll use as an example because I know and like it, but feel free to substitute the name of any other tag-based photo-hosting service with a public API), uploading images to it directly and getting information back. You want your photos tagged, but you don’t want to interrupt your shooting and you definitely don’t want to try to enter text using the camera’s inputs. How might this work?

Any image can be analyzed algorithmically by a number of different features. Color histograms, edge detection, OCR, and so on. It’s an area I admittedly don’t know a lot about. Flickr already has a huge corpus of tagged photos. The feature values for these could be extracted and saved as meta-data somewhere in the system.

When a new untagged photo gets uploaded, Flickr could extract its feature values and find other photos with near matches for those feature values. It would extract the most popular tags from those photos and send them back to your camera as a list. You’d select the ones that you wanted to use.

This user-selection process in itself would be an important part of the robotagging process, as it would help Flickr’s bot determine which feature values were relevant, or which were relevant to a specific tag. For example, it’s a good bet that a picture with the tag â€œyellowâ€ is tagged that way based on a certain histogram, but that histogram would be less relevant to the fact that the same photo is tagged â€œflower.â€ Edge-detection would tell you nothing about color-name tags, but might be more strongly relevant to the â€œflowerâ€ tag. By training the system, the users would help the tagging bot make better choices in the future. This would have results similar to the ESP game.

Once their images were robotagged on the fly, users would probably still want to go back and add more personally meaningful tags, but as a first pass at tagging, something like this could work.

Update Looks like Riya is already doing this.