December 2013

Word processors and file formats

I’ve always been interested in file formats from the perspective of long-term access to information. These have been interesting times.

To much gnashing of teeth, Apple recently rolled out an update to its iWork suite—Pages, Numbers, and Keynote, which are its alternatives to the MS Office trinity of Word, Excel, and Powerpoint. The update on the Mac side seems to have been driven by the web and iPad versions. Not only in the features (or lack thereof), but in the new file format, which is completely unrelated to the old one. The new version can import the files from the old one, but it’s definitely an importation process, and complex documents will break in the new apps.

The file format for all the new iWork apps, Pages included, is based on Google’s protocol buffers. The documentation for protocol buffers states

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

Guess what we have here. Like I said, this has been driven by the iPad and web versions. Apple is assuming that you’re going to want to sync to iCloud, and they chose a file format optimized for that use case, rather than for, say, compatibility or human-readability. My use case is totally different. I’ve had clients demand that I not store their work in the cloud.

What’s interesting is that this bears some philosophical similarities to the Word file format, whose awfulness is the stuff of legend. Awful, but perhaps not awful for the sake of being awful. From Joel Spolsky:

The first thing to understand is that the binary file formats were designed with very different design goals than, say, HTML.

They were designed to be fast on very old computers.
…
They were designed to use libraries.
…
They were not designed with interoperability in mind.

New computers are not old, obviously, but running a full-featured word processor in a Javascript interpreter inside your web browser is the next best thing; transferring your data over a wireless network is probably the modern equivalent of a slow hard drive in terms of speed.

There is a perfectly good public file format for documents out there, Rich Text Format or RTF. But curiously, Apple’s RTF parser doesn’t do as good a job with complex documents as its Word parser—if you create a complex document in Word and save it as both .rtf and .doc, Pages or Preview will show the .doc version with better fidelity. Which makes a bit of a joke out of having a “standard” file format. Since I care about file formats and future-proofing, I saved my work in RTF for a while. Until I figured out that it wasn’t as well supported.

What about something more basic than RTF? Plain text is, well, too plain: I need to insert commentary, tables, that sort of thing. Writing HTML by hand is too much of a PITA, although it should have excellent future-proofing.

What about Markdown? I like Markdown a lot. I’m actually typing in it right now. It doesn’t take long before it becomes second nature. Having been messing around with HTML for a long time, I prefer the idea of putting the structure of my document into the text rather than the appearance.

But Markdown by itself isn’t good enough for paying work. It has been extended in various ways to allow for footnotes, commentary, tables, etc. I respect the effort to implement all the features that a well-rounded word processor might support through plain, human-readable text, but at some point it just gets to be too much trouble. Markdown has two main benefits: it is highly portable and fast to type—actually faster than messing around with formatting features in a word processor. These extensions are still highly portable, but they are slow to type—slower than invoking the equivalent functions in a typical WYSIWYG word processor. The extensions are also more limited: the table markup doesn’t accommodate some of the insane tables that I need to deal with, and doesn’t include any mechanism for specifying column widths. Footnotes don’t let me specify whether they’re footnotes or endnotes (indeed, Markdown is really oriented toward flowed onscreen documents, where the distinction between footnotes and endnotes is meaningless, rather than paged documents). CriticMarkup, the extension to Markdown that allows commentary, starts looking a little ungainly. There’s a bigger philosophical problem with it though. I could imagine using Markdown internally for my own work and exporting to Word format (that’s easy enough thanks to Pandoc), but in order to use CriticMarkup, I’d need to convince my clients to get on board, and I don’t think that’s going to happen.

I can imagine a word processor that used some kind of super-markdown as a file format, let the user type in Markdown when convenient, but added WYSIWYG tools for those parts of a document that are too much trouble to type by hand. But I’m not holding my breath. Maybe I should learn LaTeX.

Bike-share systems and the poor

This morning there was a story on NPR about bike sharing, specifically how it doesn’t do a good job of serving the poor. There are basically three reasons for this:

  1. The bike stations are not located in areas most useful to poor people;
  2. You need a debit card or credit card to use the system;
  3. Bike-share programs are expensive.

The story got me thinking about all the ways it’s expensive to be poor, and they’re certainly illustrated in this example.

To get a debit card, you need a bank account. To get a bank account, you usually need to scrape together $100 for an opening balance. This is not a huge hurdle to overcome, but if you never have $100 left at the end of your pay period, it’s going to take planning, and if life throws you a curveball before you’ve got that $100 saved up, you’re back to square one.

I looked at the prices for bike-share programs. Chicago’s Divvy has two price structures: yearly memberships and day rates. $70/year or $7/day, plus usage: in both cases you get 30-minute trips for free, but if you’ve got a longer bike trip than that, you get dinged $1.50 or $2.00 per 30 minutes. Austin’s nascent bike-share system has a similar breakdown, but is slightly more expensive.

So if you’re poor, the annual plans are probably out just because of the upfront costs, even though on a per-day basis, they’re a much better deal. If anything, you’re on the daily plan (Austin also has a weekly plan), although again, this presupposes you’ve got a bank account.

What about getting your own bike? You can get a beater bike on Craigslist. There are bikes listed there right now in the $20–50 range, so if you’re poor, the break-even point for rent vs own comes quickly—within one pay period. If you could afford the daily bike rental, you could afford to buy a bike. If you’re going to use a bike for commuting to and from work, it would be a no-brainer. It would also be a no-brainer for someone with more discretionary income who wants to commute by bike.

So given that anybody with even marginal math skills could figure out that ownership beats rental for routine, day-to-day bike usage, what’s the use-case for rental? It’s for when you’re out of your routine. Non-routine uses are hard to predict—it seems redundant to point that out. That makes the best placement of bike stations problematic.

Another obvious use case is tourism, and from what I’ve seen in Chicago and San Antonio, the placement of bike stations clearly targets tourists.

I don’t think it would be a bad idea for bike-sharing systems to be more accessible to the poor, but as long as those systems are run by private companies trying to turn a profit, it’s going to be difficult to balance that equation. Organizations like the Yellow Bike Project can do more to improve bike mobility for the poor right now, by providing them with their own bikes, teaching them how to maintain bikes, and giving them access to shop space.