Voice Driven Type Design: Useful for #Captions?

Fascinating article by Woelfel, Schlippe, and Stitz. If I understood it properly, different qualities of a an audio file or spoken voice could be interpreted and impact how the words spoken would be represented typographically. In other words, volume might impact size of the type, font used, leading, if the type was ragged, or what have you.

In terms of captioning, I thought this might be interesting in terms of potentially providing an algorithmic approach for caption generation. Might be a good way to show anger, calm, seduction, etc. Then again, this might also require viewers to learn an entirely new vocabulary or develop an additional literacy to interpret what specific things, like size, font, or color, might mean.

Then there's are a few potential problems: would each company potentially have its own algorithm and thus have no consistent standard? Would the standard be free and given away? Would there be a charge for it? Could this only appear in videos or films that could afford it?

One approach that I think might work is, of course, inspired by the epic and experimental captioning done inNight Watch and the new Sherlock Holmes. Rather than trying to do these kinds of effects on all text, perhaps these could be used for such things like representing NSI (non-speech information) on screen and single word utterances, such as profanities and exclamations. Thus they would be visually different from the normal presentation of utterance and sounds, and yet they could also embody certain components of the utterances in their type design.

Plenty to think about from this piece. Hope that we see more of this type of work--and maybe some of it will enter the realm of captions. Yes!

Optimal Caption Placement: Ouzts, Snell, Maini, Duchowski

Excited I was to find this conference paper! "Yes," I thought! This will be fascinating. And then, after I finished reading all two pages, I felt disappointed. Is this the authors' fault? A bit. Is it my fault? A bit. You see, I sadly lack the necessary statistical literacy, or my statistical chops are seriously gummed up, to fully make sense of the results section. So, there's that.

Their conclusion, however, reads thus:

"An eye tracking study was presented in which several different captioning styles were examined. Significant differences were found between eye movement metrics depending on the captioning style used, suggesting that captioning styles play an important role in viewing strategies. Participants underwent large amounts of saccadic crossovers and spent much less time reading the captions when captions changed position frequently. Future work is needed to fully examine the implications of these differences" (emphasis added, p. 190).

This makes quite a bit of sense, especially when you consider that they tried four approaches to presenting the captions. (Read the article, heh!) Most notably they tried the traditional captioning positions as well as placing captions above the speakers when present on screen. If not on screen, the captions would be at the bottom. This left me wondering a couple things.

While it may be useful for comprehension to avoid lots of extra or overlapping eye movement, might it not be possible to have the caption placement near speakers occur during intense dialogue and conversation and then shift to traditional (at the bottom) placement when conversation off-screen alternates with significant NSIs (non-speech information)? That might be an interesting approach to captions to test out--especially in dialogue heavy video or film.

As to the article, I am grateful that the authors conducted and shared the research. I just wish the findings would have been more explicitly stated. Then again, there might not have been enough information, or data, to support broader generalizations or suggestions for practice. I respect that. However, in the interest of testing out other approaches to captioning, it would be nice to have some research-driven data from which to launch.