Some things I learned about alt text, and some complications

· 1935 words · 10 minute read

My long-dormant technical writer impulses kicked in a few days ago and I began to wonder about alt tags and how to write them well. I posted a request for documentation, then did a little reading on my own.

Before going any further, I’ll just say that the most useful, concise, actionable content came from a post by Alex Chen, a designer, and I will be using their approach as a practical day-to-day guide to iterate on.

I also got a little pushback from someone on the idea that an alt tag needs to be done any certain way and was advised to “just write them” same as if I was talking to a friend. If there were no advice anywhere, I would probably take that advice. But there is advice (and worse, there is documentation.) And the things I was missing to feel comfortable “just writing” were a sense of audience expectations and some artistic resistance to getting too descriptive.

I’m grateful to people who shared insights and resources with me. Because I am not some sort of public institution or UX researcher, I leaned on what I could learn for myself rather than tossing up polls or requests for direct input from people who rely on assistive technology to experience the web.

I’d like to make clear up front that I’ve gathered enough information and formed enough of a thought to frame how I think about alt tags and my particular photography as I practice using them, and expect my style will change as I sit with how I’m doing it, and, hopefully, when I get feedback.

I’m also using this post to write down some initial thoughts that I expect will evolve if feedback comes in. I also expect I will be getting some things wrong by at least some peoples’ lights. I hope they’ll have the patience to inform me.

So, why all the uncertainty?

A friend once said that the pictures I post to social networking are more “photographic” than most. I think that comment was about the work I put in to editing before posting. It’s very seldom, even when just grabbing a shot with a phone, that I just share the photo. I’ll probably crop, adjust the exposure and contrast a little, boost or mute the color, etc. even if it’s just a picture of a cup of coffee or some thing I saw on the street. It’s much more common that the photo will go through a Lightroom session, have a profile applied to it, and more.

The “and more” part was where I was getting hung up. Another friend described my photography as “moody.” So from the first friend’s assessment that my pictures were more “photographic,” I graduated in someone else’s estimation to artistically expressive.

Fair point. I am long on the record with believing that filters were the first meaningful Instagram feature. Some were sort of horrible and overdone – kitschy – but others were understated and quietly expressive. I think VSCO improved on the aesthetics, but both apps, I think, gave phone photographers a way to convey a sort of timelessness, gravitas, or pre-nostalgia. They gave everyday photos of people goofing around, lunches, and landscapes a z-axis of expressiveness. They conveyed mood.

I also think a lot about mood when I’m editing. I like using profiles that suggest the colors and contrast of film without being too “vintage.” I like shooting with toy and novelty lenses that allow some vignette or distortion. I pull shadows down harder, blow out highlights, and boost specific colors beyond “what I saw” and into “what I thought could be there.”

Some days I try to rein it in a little, especially when I spot another photographer with a more naturalistic style. Other days I lean in to it.

Yet another friend said to me, at the height of pandemic lockdown, “your pictures remind me that there is something beautiful in everyday things, and that is making this terrible time more bearable.”

I was especially happy to hear that during the lockdown. Like everyone, I was figuring out new things to do that didn’t involve being indoors and around people, and my evening photo walks were a small slice of something that let me take my mind off of what was going on.

It also made me happy because my goal, in the end, is to make something people respond to on some level below “that is very nicely composed and the subject matter is pleasing to me.” I want them to feel something, even if they can’t put their finger on what it is. In fact, the less they can put their finger on it and the more they simply have to feel it, the happier I am.

That made alt tags a little hard to figure out. I want people to take away what they take away. There’s room for people to take away “that is a very nice picture of a city street – the colors are nicely done!” There’s room for people to take away “the perspective mixes up several kinds of architecture, and its shot from an angle that somehow makes the city look tumbledown, as if the skyline is collapsing and buckling.” There’s room to say “there’s a picture of a couple on a tilt-a-whirl, I like the tones – black and white was a good choice!” and there’s room to say “he’d rather not be there.”

I don’t really know anything about art as a display process, meaning I have gone to galleries and museums, and I’ve read artist’s statements, but I’ve never studied what I guess you’d call the theory of art as a display process. Intuitively, and based on what I’ve experienced for myself, I’d say you should receive fewer inputs or interpretation of the work the closer you stand to the actual work. Maybe you read the artist’s statement before you start as a way to help you get a fingernail under everything you’re being shown. Or maybe you save it for after, so as you sit there thinking about everything atomically, some organizing principle can tie it all together and give it some meaning.

As you’re scrolling a small collection of images someone put on a social networking service, that’s “in the gallery and next to the work.” It feels wrong to hang a placard under the image and dissect the image’s assorted evocations. On the other hand, I’d guess our entire conception of how to do art as a display process reflects a period where the reflexive response to “what about people with visual impairments?” would, at best, provoke some questions about “how serious?” and at worse simple dismissal.

Practical Advice 🔗

So outside those considerations, here are some useful things I found:

First off, the basics. This post from Fuel Your Photos recapitulated the most common advice everywhere:

  1. Write in full sentences including case and punctuation.
  2. Keep the text as short as possible (this guide says 15 words, I’ve also seen 150 characters).
  3. Don’t include information that is already given in the text surrounding the image.
  4. Don’t include “image of,” “photo of,” or “picture of” (a screen reader will already say this).
  5. Include keywords, locations, and studio name ONLY when relevant.
  6. Try to include additional words and context that are not represented in the page text.
  7. Make sure the alt text is unique for each photo on the page.

There is also a common piece of guidance that says “just the facts, no interpretation.”

Going a little deeper, I found the Cooper Hewitt Smithsonian Design Museum guidelines for image descriptions. In addition to several of the common points made above, it touches on describing colors in an image (it’s okay to do so) and also gets into gender and skin tone:

On gender:

“No assumptions should be made about the gender of a person represented. Although, where gender is clearly performed and/or verifiable, it should be described. When unknown, a person should be described using ’they, them’ and ‘person’ and their physicality expressed through the description of their features, which inadvertently tend to indicate masculine or feminine characteristics. The use of masculine and feminine are problematic and should be avoided unless necessary for describing the performance of gender.”

Yet other guidance (lost the link, sorry) went on to say you should never use gender, especially when describing nudes, but did leave open the idea of clothing as a kind of gender performance.

Some of this guidance is a stopper for me and I am not done processing it. To be frank, the reliable guidance about how to discuss gender and sex has moved on from when I was a volunteer ally skills workshop facilitator, and I am not sure how to engage in any public discussion that goes beyond affirming the gender identity people choose.

On skin tone:

“When describing the skin tone of a person use non-ethnic terms such as ’light-skinned’ or ‘dark-skinned’ when clearly visible. Because of its widespread use, we recommend the emoji terms for skin tone as follows: 🏻 Light Skin Tone, 🏼 Medium-Light Skin Tone, 🏽 Medium Skin Tone, 🏾 Medium-Dark Skin Tone, 🏿 Dark Skin Tone. Also, where skin tone is obvious, one can use more specific terms such as black and white, or where known and verified, ethnic identity can be included with the visual information: Asian, African, Latinx/o/a (also see gender), etc.”

This guidance stumbles a little, conflating skin tone (which is as observable and describable as they suggest), and ethnicity (or “race,” if you prefer, and I do not).

For instance: “where skin tone is obvious, one can use more specific terms such as black and white.”

That’s a curious formulation, because “black” and “white” are not skin tones under their own “emoji-based” taxonomy. I am pretty sure they mean “Black” (or, to narrow it down as we embark on this journey through the linguistic thickets of race, “African American”) and “American white.”

So more to the point I think they mean “where you can localize the person in the American racial taxonomy.” They should have just said that or something similar.

There’s much more to read there, and the guide distinguishes between descriptive text and alt text. The alt text is invariably quite simple.

Finally, someone on Mastodon provided me with a very helpful article written by a Product designer & accessibility advocate named Alex Chen who uses an “object-action-context” approach:

“there is a storytelling aspect to writing descriptions. It doesn’t necessarily make sense to go from left to right describe everything in an image because that might lose the central message or create a disorienting feeling. For that reason, I came up with a framework that I recommend called object-action-context.

The object is the main focus. The action describes what’s happening, usually what the object is doing. The context describes the surrounding environment.

“I recommend this format because it keeps the description objective, concise, and descriptive.

“It should be objective so that people using the description can form their own opinions about what the image means. It should be concise so that it doesn’t take too long for people to absorb all the content, especially if there are multiple images. And it should be descriptive enough that it describes all the essential aspects of the image.”

I found that very helpful, because it’s a systematic way to describe an image journalistically, which is really what I wanted going in. I have other concerns I want to play around with as I try things out and perhaps get feedback, but it’s nice to have a simple guide.