I recently bought a new digicam, and I’ve been working on a translation job that relates to signal processing. These two facts, shaken together with some loose synapses in my brain, got me thinking along the following lines.
Digital cameras these days, in addition to taking better pictures, have better processors, and some have interesting ancillary functions. Kodak, for instance, used a general-purpose operating system in some of its cameras that can run user-supplied software. Inevitably, someone adapted this to play video games on the camera’s screen, but this was used in other clever ways (to take a picture every five minutes and upload to a connected computer, say).
We’re also starting to see digicams with wifi connectionsâ€”in theory, if you’re near a hotspot, you could put your pictures online as quickly as you shoot them. We may also see cameras with Bluetooth that could get online via a cellphone connection.
But what a mess that would be to manage, a constant stream of unnamed, untagged photos. Since I started using Flickr, I’ve found that tags are often more useful than titles for photos. But who wants to try to apply tags via your camera’s interface? What a pain. That got me thinking about robotagging.
Imagine you have a digicam of the not-too-distant future that can talk to Flickr (which I’ll use as an example because I know and like it, but feel free to substitute the name of any other tag-based photo-hosting service with a public API), uploading images to it directly and getting information back. You want your photos tagged, but you don’t want to interrupt your shooting and you definitely don’t want to try to enter text using the camera’s inputs. How might this work?
Any image can be analyzed algorithmically by a number of different features. Color histograms, edge detection, OCR, and so on. It’s an area I admittedly don’t know a lot about. Flickr already has a huge corpus of tagged photos. The feature values for these could be extracted and saved as meta-data somewhere in the system.
When a new untagged photo gets uploaded, Flickr could extract its feature values and find other photos with near matches for those feature values. It would extract the most popular tags from those photos and send them back to your camera as a list. You’d select the ones that you wanted to use.
This user-selection process in itself would be an important part of the robotagging process, as it would help Flickr’s bot determine which feature values were relevant, or which were relevant to a specific tag. For example, it’s a good bet that a picture with the tag â€œyellowâ€ is tagged that way based on a certain histogram, but that histogram would be less relevant to the fact that the same photo is tagged â€œflower.â€ Edge-detection would tell you nothing about color-name tags, but might be more strongly relevant to the â€œflowerâ€ tag. By training the system, the users would help the tagging bot make better choices in the future. This would have results similar to the ESP game.
Once their images were robotagged on the fly, users would probably still want to go back and add more personally meaningful tags, but as a first pass at tagging, something like this could work.
Looks like Riya is already doing this.
2 thoughts on “Robotic folksonomy”
It seems like the most useful thing would be to add GPS units to cameras, and include co-ordinates as metadata with each image. While GPS co-ordinates might not be so useful in themselves, it seems like you could teach a machine what corresponds to “CafeMundi” or “EnchantedRock,” giving you some degree of organization with automated photos.
That would also be incredibly handy, especially for bundling together photos from a given session and bulk-tagging them (not to mention collaborative documentation of the physical world), but would have different results. And for that matter, geotagging and image-feature robotagging could interact, so that, if you took a picture of a building at a certain location, the system might be able to figure out, not only is that a building in general, that’s a specific building that others have tagged before. A picture of something that the robotagger can tell is some kind of plant is more likely to be a cactus if that photo was taken at Enchanted Rock.