2018

Dealing with graphics in MS Word

Microsoft’s Word’s ubiquity is rivaled only by its badness. Since we’re stuck using it—and often using files created by other people in it—we need to find coping mechanisms.

One especially vexing problem in Word is the way it deals with placed graphics. This post isn’t an exhaustive tutorial on how to work with graphics in Word–it lays out one method that will work in most cases, and explains how to make that work.

Let’s say you receive a file that looks something like this, with a placed photo and some text boxes and arrows laid over it to call out features.

a picture of three adorable cats in a Microsoft Word document
A typical document

You edit the file, do something seemingly innocuous, and you wind up with something like this

a picture of three adorable cats in a messed-up Microsoft Word document
A messed-up document

Obviously you can’t let the file go out into the world like this, and because you are a good person, you want to leave things better than you found them. So how do you fix this? Or if you’re required to create files like this, how do you prevent this from happening in the first place?


It’s easy to get into trouble with Word any time you try using its page-layout features. If at all possible, it’s best to treat every document as a continuous river of text, rather than isolated blocks. The problem with images is that Word gives you numerous options for treating images as isolated blocks, and exactly one option for treating them as part of that river. When you mix externally created images and graphics that are created in Word, things get complicated. And these are overlaid on one another, things get even more complicated.

In the image shown above, there’s a photo that was created externally, and three text boxes and arrows that were created within Word. So the first thing to understand is how Word treats these differently: the photo is a picture and the arrows & text boxes are shapes. They have different formatting options available to them. However, interestingly, you can crop a picture in Word using its “picture format” tools, and that turns it into a shape (!).

Most of the trouble you run into with these hybrid images revolve around placement options. Word gives you two sets of parameters for dealing with pictures/shapes in text: positioning and text wrap

Microsoft Word's positioning pane
Word’s positioning menu options
Microsoft Word's text-wrap pane
Word’s text-wrap menu options

If a visual element has the positioning in line with text, then it behaves like a typed character—it can sit on a line with other characters, it moves around with other elements, etc. And I argue that this should be your goal for most or all visual elements you use in Word. You can set them on their own line, use other techniques to marry them to captions, center them, etc.

With all the other positioning options, the element is anchored to a spot on the page—a certain distance from a corner, for example. If you anchor the element, you use the wrapping options to tell Word how to wrap text around (or over, or under) the element. There may be legitimate reasons to do this, but Word is a rotten tool if that’s what you’re trying to do. I often see files where someone has placed an image with fixed positioning that they really just want inline with text—and then they insert a bunch of carriage returns to put the text down below it. This will break as soon as the text above gets a little longer or shorter.

Also, just for fun, if you set the wrap to “in line with text,” Word automatically does the same for the positioning, and vice-versa. This kind of makes sense, but can be confusing.

To simplify your life, treat each graphic as a standalone block, on its own line, flowing with the text.

This gets more complicated when you’re combining a picture with shapes. By default, the picture is placed “inline”. By default, a shape is positioned relative to something—positioning can be relative to the page, margin, paragraph, line, with separate options for horizontal and vertical position. Ain’t nobody got time for that.

So we’re back to inline positioning as the right way.

But with the Orientalist mysticism that you only find in cheesy action movies, when you’re dealing with a hybrid image like this, Word forces you to do things the wrong way before you can do them the right way. Here’s the trick: we need to manually position the picture and the shapes relative to each other. And Word doesn’t let you manually position elements that have inline positioning—again, it does make sense, but is confusing until you understand the principle.

First, make sure that all the visual elements have some kind of positioning that isn’t inline—it doesn’t really matter what.

Second, get all the shapes lined up correctly over the picture that acts as the backdrop. If some of the shapes are getting hidden behind the picture, select the picture and then execute Picture Format > Arrange > Send to Back.

Third, I like to group all the shape elements. This is probably unnecessary. Shift-click on all the elements in turn to select them and then execute Shape Format > Arrange > Group. The image below shows the shape elements grouped together, with a frame around them. You can still separately manipulate the elements in a group—it’s possible to move a grouped element unintentionally; if you need to move the group, you need to grab it by the group’s frame.

grouped shapes
Grouped shapes in Word

Fourth, shift-click to select the grouped shape elements and the background picture, and group those.

Fifth, set the positioning of these grouped elements to “inline with text.” Phew! It’s faster to do than to read.

A lightweight documentation system

Background

Not long after I took on my current role in AAR LLC, I inherited the task of producing the “binders” that the organization prints up every event cycle–basically, operations manuals for the event.

There was a fair amount of overlap between these binders, and I recognized that keeping that overlapping content in sync would become a problem. I studied documentation technologies and techniques, and learned that indeed, this is considered a Bad Thing, and that “single sourcing” is the solution–this would require that the binders be refactored into their constituent chapters, which could be edited individually, and compiled into complete documents on demand.

The standard technology for this is DITA, but that involves a lot of overhead. It would be hard for me to maintain by myself, and impossible to hand off to anyone else. What I’ve come up with instead is still a bit of a work in progress. It still has a bit of a tech hurdle to overcome–it does involve using the command line–but should be a lot more approachable.

The following may seem like a lot of work. It’s predicated on the idea that it will pay off by solving a few problems:

  • You are maintaining several big documents that have overlapping content
  • You want to be able to publish those documents in more than one format (web, print, ebook)
  • You want to be able to update your materials easily, quickly, and accurately.

The following is Mac-oriented because that’s what I know.

Installation

Install Homebrew

Homebrew is a “package manager” for MacOS. If you’ve never used the command-line shell before, or have never installed shell programs, this is your first step. Think of it as an App Store for shell programs. This makes installing and updating other shell apps much easier

To install, open the Terminal app and paste in

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Important: Don’t paste in shell commands you find on the Internet unless you know what you’re doing or you really trust me. But this is exactly what the nice folks at Homebrew will tell you to do.

Install Pandoc

Pandoc is a swiss-army knife tool for converting text documents from one form to another. In the Terminal app, paste in

brew install pandoc

Homebrew will chew on that for a while and finish.

Install GPP

GPP is a “generic preprocessor,” which means that it substitutes one thing for another in your files. In the Terminal app, paste in

brew install gpp

Again, Homebrew will chew on that for a while and finish.

Learning

Learn some shell commands

You’ll at least need to learn the cd and ls commands.

This looks like a pretty good introductory text.

Learn Markdown

Markdown was created by John Gruber to be a lightweight markup language–a way to write human-readable text that can be converted to HTML, the language of web pages. If you don’t already know the rudiments of HTML, the key thing to remember about it is that it describes the structure of a document, not its appearance. So you don’t say “I want to this line to be in 18-pt Helvetica Bold,” you say “I want this line to be a top-level heading.” How it looks can be decided later.

Since then, others have taken that idea and run with it. The author of Pandoc, John MacFarlane, built Pandoc to understand an expanded Markdown syntax that adds a bunch of useful features, such as tables, definition lists, etc. The most basic elements of Markdown are really easy to learn; it has a few less-intuitive expressions, but even those are pretty easy to master, and there are cheat-sheets all over the Internet.

Markdown is plain text, which means you can write it in anything that lets you produce text, but if you do your writing in MS Word (aside: please don’t), you need to make sure to save as a .txt file, not a .doc or .docx file. There are a number of editors specifically designed for Markdown, that will show a pane of rendered text side-by-side with what you’re typing; there’s even a perfectly competent online editor called Dillinger that you can use.

I’ve gotten to the point where I do all my writing in Markdown, except when required to use some other format for my work. There are a lot of interesting writing apps that cater to it, writing is faster, files are more portable and smaller.

Organization

Refactor files and mark them up

Getting your files set up correctly is going to be more work than any other part of this. You’ll need to identify areas of overlap, break those out into standalone documents, decide on the best version of those (assuming they’re out of sync), and break up the rest of the monolithic documents into logical chunks as well. I refer to the broken-up documents as “component files.”

Give your files long, descriptive names. For redundancy, I also identify the parent documents in braces right in the filename, eg radio_channel_assignments_{leads}_{gate}.md. Using underscores instead of spaces makes things a little easier when working in the shell. Using md for the dot-extension lets some programs know that this is a Markdown file, but you could also use txt.

Then you’re going to mark these up in Markdown. If your files already have formatting in MS Word or equivalent, you’re going to lose all that, and you’ll need to make some editorial decisions about how you want to represent the old formatting (remember: structure, not appearance). Again, this will be a fair bit of work, but you’ll only need to do it once, and it will pay off.

Organize all these component files in a single directory. I call mine sources.

Create Variables

This is optional, but if you know that there are certain bits information that will change regularly, especially bits that appear repeatedly throughout your documents, save yourself the trouble of finding and replacing them. Instead, go through your component files and insert placeholders. Use nomenclature that will be obvious to anyone looking at it, like THE_YEAR or FLAVOR_OF_THE_MONTH. You don’t need to use all caps, but that does make the placeholders more obvious. You cannot use spaces, so use underscores, hyphens, or camelCasing.

Now, create a document called variables.txt. Its contents should be something like this:

#define THE_YEAR 2018
#define FLAVOR_OF_THE_MONTH durian
…

And so on. Each of these lines is a command that GPP will interpret and will substitute the first term with the second. This lets you make all those predictable changes in one place. Save this in your sources directory.

You can get into stylistic problems if you begin a sentence with a placeholder that gets substituted with an uncapitalized replacement. There may be a good solution, but I haven’t figured it out. You should be able to write around this in your component docs.

Create BOMs

In order to rebuild your original monolithic documents from these pieces, you’ll want to create what I call a bill of materials (BOM) for each target document. This defines what the constituent component files are, and when you run GPP, the BOM tells GPP to assemble its output file from those component files.

I like to keep each BOM in a separate directory that’s at the same level as my sources directory (This gives me a separate directory to contain my output files.), so my directory tree looks like this:

My Project
     gate
        gate-bom.txt
     leads
        leads-bom.txt
     sources
        variables.txt
        radio_channel_assignments_{leads}_{gate}.md
        …

The contents of each BOM file will look something like this:

#include ../sources/variables.txt
#include ../sources/radio_channel_assignments_{leads}_{gate}.md
…

Because the BOM file is nested in a directory adjacent to the sources directory, you need to “surface” and then “dive down” into the adjacent directory. The leading ../ is what lets you surface, and the sources/ is what lets you dive down into a different directory.

Compilation & conversion

So you’ve got your files refactored and marked up, you’ve got your variables set up, you’ve got your BOMs laid out. Now you want to get back what you had before. Now it’s time for the command line.

Open the Terminal app, type cd followed by a space, drag the folder containing the BOM file you want to compile into the Terminal window (this will insert the correct path), and hit “return”. Use the ls command to confirm that the only file you can see is the BOM file you want to compile.

Now it’s time to run the following command:

gpp source-bom.txt -o source.md

This says “tell GPP to read in the file source-bom.txt, run all the commands in it, and create an output file based on it called source.md”. Make whatever filename substitutions are appropriate. The output file will be in the same directory as the BOM file. This will be a big Markdown file that is assembled from all the component files in the BOM, with all the variable substitutions performed.

Now that you have a Markdown file, the world is your oyster. Some content-management systems can interpret Markdown directly. WordPress requires the Jetpack plugin, but that’s easily installed. So depending on how you’ll be using that document, you may already be partly done.

If you want to convert it to straight HTML, or to an MS Word doc, or anything else, now it’s time to run Pandoc. Again, in the Terminal app, type this in:

pandoc source.md -s -o source.html

This says “tell Pandoc to read in the file source.md and create a standalone (-s) output file called source.html”. Pandoc will create HTML files lacking headers and footers if you leave out the -s. It figures out what kind of output file you want from the dot-extension, and can also produce MS Word files and a host of others. It uses its own template files as the basis for its output files, but you can create your own template files and direct Pandoc to use those instead.

I do my print documents in InDesign, and Pandoc can produce “icml” files that InDesign can flow into a template. Getting the styles set up in that template takes some trial and error, but again, once you’ve got it the way you like it, you don’t need to mess with it again.

Shortcomings and prospects

The one thing this approach lacks is any kind of version control. In my case, I create a directory for every year, and make a copy of the source directory and the rest inside the year directory. This doesn’t give me true version control–I rely on Time Machine for that–but it does let me keep major revisions separate. Presumably using Git for my sources would give me more granular version control.

Depending on what your output formats are going to be, placed images can be a bother. I haven’t really solved this one to my satisfaction yet. You may want to use a PDF-formatted image for the print edition and a PNG-formatted image for the web; Pandoc does let you set conditionals in your documents, but I haven’t played with that yet.

In fact, I haven’t really scratched the surface of everything that I could be doing with GPP and Pandoc, but what I’ve documented here gives me a lot of power. I’ve also recently learned of a different preprocessor called Panda, which subsumes GPP and can also turn specially formatted text into graphical diagrams using other shell programs, such as Graphviz. I’m interested in exploring that.

The walled gardens of shit

Over a century ago, King Gillette pioneered the razors and blades business model. The DMCA led to a new twist on this: companies have been trying to force you to buy their blades in particular by slapping microchips on them–even when those things don’t really have any need of a microchip–because that makes it illegal to reverse engineer.

This gave us the Keurig coffee machine, which has been successful, but has been deservedly criticized–even by its inventor–for its wastefulness. Keurig attempted to add DRM to their pods, although that backfired.

Catering to the herd mentality of the investor class (“It’ll be like Amazon, but for X!” “It’ll be like Facebook, but for X!” “It’ll be like Uber, but for X!”), this has led to…

The Juicero, a massively over-engineered $400 (marked down from $700) gadget that squeezed $8 DRM-laden bags of fruit pulp into juice. It flopped.

Then the Teaforia, a $400 gadget (marked down from $1000) that makes tea from DRM-laden pods that cost $1 each or more. It flopped.

Now this thing, a spice dispenser that uses DRM-laden spice packets that cost about $5 a pop (spices obviously vary in prices, and it’s not clear how much comes in one of their packets, but I just bought 4 tbsp of cinnamon for $0.35).

These Keurig imitators represent an intersection of at least two bad trends: the Internet of Shit, where stuff that has no need of ensmartening is gratuitously connected to the Internet–a logical consequence of sticking unnecessary DRM-enabling chips on things, with those chips getting cheaper and more powerful–and the walled gardens of yore, like AOL–which companies like Facebook and Google have been attempting to reconstruct on top of the Internet ever since. So now we’ve got walled gardens of shit, filling up with their own waste products. Happily, the market seems to be rejecting these.

Big-number cheat sheet and BetterTouchTool

BetterTouchTool is one of my favorite Mac utilities. A real sleeper: originally it just let you create new trackpad gestures (or remap existing ones), and that was useful enough on its own, but it’s been beefed up with more and more interesting features. One feature I just discovered is that it can display a floating window with any HTML you want. This is a perfect way to show my Big Number Cheat Sheet, which is handy for checking your work when dealing with, well, big Japanese numbers.

To use this, open up BTT, add a new triggering event (can be triggered by a key command or text string, trackpad, whatever), and add the action Utility Actions > Show Floating Web View/HTML menu. Give it a name, set it to a width of 500, height of 750, and paste the following in directly. (Posting this online introduces a space between the opening < and !DOCTYPE — that should be deleted.) Be sure to enable “show window buttons” and/or “close when clicking outside” or the window won’t go away.

< !DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title> </title>
    <style> 
        body {
        background-color: #fff;
        font-family: helvetica;
        font-size: 14/18;
        }
        table {
        border-collapse: collapse;
        }
        tr, td, th {
        border: none;
        }
        tr {
        border-bottom: 1px solid #ddd;
        }
        table tr td:nth-child(1), table tr th:nth-child(1) {
        width: 7em;
        padding: 0.5em;
        text-align: right;
        }
        table tr td:nth-child(2), table tr th:nth-child(2) {
        width: 12em;
        padding: 0.5em;
        text-align: left;
        }
        table tr td:nth-child(3), table tr th:nth-child(3) {
        padding: 0.5em;
        text-align: left;
        }
        tr:hover {
        color: #ddd;
        background-color: #333;
        }
    </style>
</head>
<body>
<h1>
    Big number cheatsheet 
</h1>
<table>
    <tr>
        <th> 和 </th>
        <th> English </th>
        <th> Number </th>
    </tr>
    <tr>
        <td> 一万 </td>
        <td> ten thousand </td>
        <td> 10,000 </td>
    </tr>
    <tr>
        <td> 十万 </td>
        <td> one hundred thousand </td>
        <td> 100,000 </td>
    </tr>
    <tr>
        <td> 百万 </td>
        <td> one million </td>
        <td> 1,000,000 </td>
    </tr>
    <tr>
        <td> 千万 </td>
        <td> ten million </td>
        <td> 10,000,000 </td>
    </tr>
    <tr>
        <td> 一億 </td>
        <td> one hundred million </td>
        <td> 100,000,000 </td>
    </tr>
    <tr>
        <td> 十億 </td>
        <td> one billion </td>
        <td> 1,000,000,000 </td>
    </tr>
    <tr>
        <td> 百億 </td>
        <td> ten billion </td>
        <td> 10,000,000,000 </td>
    </tr>
    <tr>
        <td> 千億 </td>
        <td> one hundred billion </td>
        <td> 100,000,000,000 </td>
    </tr>
    <tr>
        <td> 一兆 </td>
        <td> one trillion </td>
        <td> 1,000,000,000,000 </td>
    </tr>
    <tr>
        <td> 十兆 </td>
        <td> ten trillion </td>
        <td> 10,000,000,000,000 </td>
    </tr>
    <tr>
        <td> 百兆 </td>
        <td> one hundred trillion </td>
        <td> 100,000,000,000,000 </td>
    </tr>
    <tr>
        <td> 千兆 </td>
        <td> one quadrillion </td>
        <td> 1,000,000,000,000,000 </td>
    </tr>
    <tr>
        <td> 一京 </td>
        <td> ten quadrillion </td>
        <td> 10,000,000,000,000,000 </td>
    </tr>
</table>
</body>
</html>

Oregon-Washington trip 2017

Hiking the Tom McCall Preserve

Gwen and I spent a couple of weeks in Oregon and Washington at the end of 2017. Following are some random highlights:

Portland OR

  • Japanese gardens. Someone suggested we go with a guide. There was a guide starting a tour right after we got there, but we quickly discovered that we’d rather take in the gardens on our own. Going in the winter turned out to be for the best, as the gardens are incredibly popular and crowded during the warmer months. We were almost able to pretend we were alone there in spots, which is more what they’re about.
  • Bollywood Theater. Casual Indian restaurant. Really good.
  • Paxton Gate. Shop that specializes in skeletons, mounted animals, etc. We already have a bat from them.
  • Powell’s Books. Covers a city block.
  • Bread and Ink Cafe. Nothing really unusual about it, just solid hot food on a cold day, and our waiter bore an uncanny resemblance to the character Mike Ehrmantraut from Breaking Bad.
  • Sweedeedee. While staying at our AirB&B, we wound up chatting with a neighbor as he was walking his dog and we were heading out to breakfast. He recommended this place for a “real Portland experience.” Mission accomplished. They didn’t tell me the name of the pig that provided my bacon, but it was straight out of Portlandia.
  • Tin Shed. Neighborhood joint near where we were staying.
  • Peculiarium. A ridiculous wunderkammer. Good for a brief diversion and getting a photo on Krampus’ lap.
  • Noble Rot. Fancy. I had the burger, which was the most humble thing on the menu. It was damned good.
  • I think Gwen found three different gluten-free bakeries in Portland, which is not all that surprising.
  • We wanted to visit Multnomah Falls, but it was inaccessible due to a fire back in September that left the soil unstable. We drove on without much of a plan and entirely by accident wound up at Tom McCall Preserve, which had no facilities to identify it as a park, but had a good hiking trail and an amazing view of the Columbia River gorge. We saw a road-construction crew pull over, jump out, and start taking pictures while we were there, which I thought was interesting–I figured they already would have seen everything. There was also a model and a photographer doing a photoshoot there.

McMinnville OR

  • Evergreen Aviation & Space Museum. While in Portland, I overheard that the Spruce Goose was in a museum not far away, and convinced Gwen we had to go. It was out of our way, and not a cheap museum to visit, but worth it. The docents are all ex Air Force and will bend your ear for as long as you’ll let them. The Spruce Goose itself is unbelievable in the most literal way: you look at it and you can’t believe it’s real. Your mind rejects it. They let you walk into the cargo area, which is surprisingly small. The museum also has an SR-71, which is surprisingly long and seems like alien technology, airplanes (or reproductions) from the beginning of flight to present, rockets (including a complete Atlas rocket), demounted jet and piston engines and rocket motors, a Mercury capsule, and a Gemini capsule. You can get right up next to the Mercury capsule and look into it. I found it remarkably affecting–looking at it up close, I could see it was just a tin can, and I thought about the men who voluntarily climbed into that tin can on top of a missile, and the aspirations and pride of a nation that was invested in that tin can.

Astoria

I can’t say much for Astoria. The one thing that had attracted us to the town was the Museum of Whimsy, which we found out just a few days before we arrived was closing for the season (insert sad trombone sound).

We had dinner at the Buoy Beer Company, where we had fried oysters, among other things. The place is touristy, like the rest of the town, and is distinguished by some glass floor panels giving a view of sea lions.

We did visit the Astoria Column, which was interesting in itself, but more interesting for the view of the surrounding area it affords, which is amazing. People buy balsawood toy airplanes at the gift shop below and launch them from the top, which is fun but ridiculously moopy.

Port Angeles

We visited Port Angeles not so much for the town itself but for its proximity to the Olympic National Forest/Olympic National Park. We did manage to get in an 8-mile hike along the Spruce Railroad trail, which was beautiful, but that day ended with surprisingly heavy snowfall, so the next day we hunkered down and caught up on House of Cards.

Seattle

We had a micro-apartment AirB&B in the Capital Hill neighborhood. Like, as small as the apartment I had when I was in Japan, but with much worse space utilization. The listing didn’t exactly lie, but it showed views that we think were only visible from the rooftop deck. The unit had no kitchen, although there was a communal kitchen on the ground level for the 12 or so units in the building.

We didn’t have a lot of time to take in Seattle, and part of that time was dedicated to getting together with Gwen’s cousins (which was enjoyable, but not a recommendation for the general public). One place we happened across was Ada’s Technical Books and Cafe. As I said to Gwen, it’s either a bad thing or a good thing or we don’t have a place like it in Austin. We both could have spent all day browsing there.

One of the high points was visiting the Seattle Art Museum, which was showing a massive Andrew Wyeth retrospective.

We had dinner one night a Blueacre Seafood, which was spendy but good.

I took some pictures, too