A Question of Character: Finding What You Need in Your Fonts, Part 1

When I asked a few columns back about your interest in Unicode, a lot of hands shot up. But just as many of you said you wanted to know how to get at characters that have been hiding in your fonts since long before Unicode was dreamt up. Unicode may be the key to understanding and using fonts with very large character sets, but on a daily basis most of us use smaller — and often much older — fonts that present mysteries of their own.
So think of this column as “Font Exploration 101.” It will focus on excavating characters from your everyday fonts. Part 2 will delve into Unicode as well as system and application tools for exploring larger fonts, including Windows Character Map and Mac OS’s Character Palette.
The Rosetta Stone
To download a very useful document, click the link
www.adobe.com/type/pdfs/AdobeWestern2.pdf. It lists all the characters in the Adobe Western 2 character set, which Adobe now uses for as a basis for all of its so-called “Standard” fonts. (I’ll explain the Adobe Western 2 character set and Standard fonts in a bit.)
More importantly, this PDF lists the keyboard commands or other techniques that you can use to access all 228 characters in PostScript Type 1 fonts, as well as in all other newer fonts that contain the same characters. The degree symbol, multiplication sign, the cents sign, they’re all here, and much, much more.
Keep this document at your fingertips.
How We Got Here
Now you have the keyboard shortcuts, but you probably still don’t understand font format history.
When Adobe introduced PostScript in the 1980s, it created a font format to go with it, which we now call PostScript Type 1. Tens of millions of Type 1 fonts are still in circulation.
These are the icons currently associated with PostScript Type 1 fonts. In Windows, they may get the “a-for-Adobe” icon or the generic document icon shown on the right.

In Windows, these font files have the file name extension .pfb or .pfm. On the Mac, they normally have no file name extensions because they predate OS X, when Apple started using them. The LWFN in the Mac icon stands for LaserWriter Font, after Apple’s first PostScript printer.
Except for fonts containing only symbols or all capitals, nearly all Type 1 fonts have a standard character set. That is, they all contain the same letters, numbers, punctuation marks, and symbols. This character set is still at the heart of almost every font you use, regardless of format.
In these fonts, each character is identified by a number from 0 to 255, because all computer systems identify characters by number, and 256 is the most numbers you can specify with a single byte of computer data. Consequently, these fonts are sometimes referred to as single-byte fonts.
In fact, these fonts contain only 228 printable characters (including the word space), with the other “slots” in the font occupied by program commands, such as Return.
These 228 characters, which comprise the standard Type 1 character set, are still included in nearly every text font.

Apple and Microsoft adopted competing strategies for using Type 1 fonts. For example, neither company’s operating system allowed access to all of the 228 characters — each OS used a different subset. Some characters were only accessible in Windows, others only on the Mac, and yet others were unavailable to both, as shown here:

In addition, the Mac OS created the illusion that certain Greek and math characters were a part of each font, when in fact they were being “borrowed” from the Symbol font when the appropriate keystrokes were typed. You may have noticed that these characters didn’t match the style of the typeface you were using, unless that happened to be Helvetica or Times Roman. Here are those refugees from Symbol:

On both systems, characters with ID numbers in the range from 0 through 127 were typed directly as printed on the keyboard, with or without the Shift key. To type higher-numbered characters, the Mac added the Option key to the mix (e.g., Shift-Option-d). The keystrokes required to access these “high-bit” characters are revealed in the Mac Keyboard Viewer utility. Windows’s technique was to hold down the Alt key while typing the character’s ID number using the numeric keypad.
For the most part, these methods for character access continue to work — or not work, in the case of the characters just cited — even though Windows and the Mac OS have evolved away from their old ways of dealing with fonts. Most significantly, both now rely on Unicode, a new standardized (yay!) numbering system, for identifying characters within fonts and within documents. For example, when using Macs running OS X, Unicode-savvy programs will not render the “borrowed” characters correctly when you’re using Type 1 fonts. This is probably not a big deal for most of you. If you do need these characters, you can either get them directly from the Symbol font itself or buy OpenType versions of your old Type 1 fonts.
Unicode and The Rise of the Big Fonts
In the late 1980s, Microsoft and Apple (principally) developed a font format to compete with Adobe’s, and they called it TrueType. It was PostScript-compatible, but it used a two-byte character-numbering system, so it could contain on the order of 65,000 characters. Cool. TrueType fonts were the first popular double-byte or two-byte fonts.
Apple ceded control of the format to Microsoft, and in time they and Adobe created a hybrid format — OpenType — that encompassed both the TrueType and PostScript Type 1 specs. OpenType fonts have many charms, including using a single file format that will work on both Apple and Windows platforms. Cooler still.
Here’s what TrueType and OpenType icons look like:

Because OpenType is a variant of the TrueType spec, you often see OpenType font icons assigned to font files whose names end in .ttf (which stands for TrueType Font).
But with more capability came more complexity. In TrueType and OpenType fonts, for example, there’s no longer any standard character set — what’s in a font varies from vendor to vendor and font to font.
Despite their potential to hold thousands of characters, most OpenType fonts contain very few more than the old Type 1s. (Adding lots of characters is expensive.) The “Adobe Western 2” character set used for its “Standard” OpenType fonts, for example, has only 17 more characters than the original Type 1 fonts, which Adobe no longer sells. Adobe’s so-called “Pro” fonts may have many more, but their contents are unpredictable. You can still count on these fonts to include the “old 228,” but beyond that, all bets are off.
Here are the 17 characters that Adobe added to the Type 1 fonts in its library when it converted them to OpenType format.

Most of these “new” characters are the ones that the Mac OS used to borrow from the Symbol font, although they’re new for Windows users. Only three are really new, and are all European imports: the symbols for the euro, liter, and “approximately.”
This milk container contains approximately one liter, as indicated on the label by the two symbols shown here.

TrueType fonts are still made, but they typically lack capabilities that are often added to OpenType fonts, such as the ability to automatically substitute one character for another (old-style numbers, for example, or small capitals). However, the OpenType spec is a pick-and-choose menu for font developers, so there’s no standard “behavior” for these fonts either.
Worse, there are no standard techniques — no standard keystroke sequences, for example — for setting the characters that these large fonts contain. Uncool.
Unicode: The Key to the Highway
The adoption of Unicode is great for cross-platform compatibility, and documents can now travel from Mac to PC and back without all those once-common character substitutions. They can also travel from continent to continent and language system to language system, thanks to a handful of mega-fonts that contain characters from many languages.
Windows versions since Windows 2000 as well as Mac OS X use Unicode behind the scenes to extract characters from fonts and identify their use in documents. Recent software has also made the transition.
In my next column, I’ll delve into the Unicode world and explain how this clever new standard works. You’ll learn how complex Unicode-based fonts work, and how to use Unicode numbers to track down hard-to-find characters lurking inside fonts with huge character sets. Both Windows and the Mac OS — as well as some applications — have tools to help, but in many ways we’re still in the pioneering days, so there’ll be some handy trail-blazing tips as well.
 

James Felici has worked in the publishing industry for over 30 years. He is the former managing editor of Publish magazine, and written for PC World, Macworld, and The Seybold Report. A renowned type expert, he is the author of The Complete Manual of Typography.
  • Anonymous says:

    I’ve already printed out (and saved) the PDF of glyphs, and thank you big time for it! As volunteer production editor, using Adobe InDesign, of a quarterly for a non-profit organization, I often have to hunt up various non-standard letter forms, and using the character map can be a pain. I’ve memorized a few (for instance, I have a friend named Joëlle…) but having a quick cheat-sheet with all of them easily found the “old fashioned” way is wonderful. (“Old fashioned” because I’m 67, and although I’ve been using computers since the Commodore 64 I bought myself in ’83, I still like to have a piece of paper to refer to for something like this!) I’m also sending the PDF to the co-editors of the quarterly, as it will be of interest to them as well.

    This sort of thing can be of tremendous help to folks like us, and is much appreciated!

    Lesley O’Neil
    Edmonton, Alberta, Canada

  • Anonymous says:

    One of your best and most informative articles yet. Thank you, thank you!
    Rod McCaskill
    Mail-Graphics Print & Ad
    Canoga Park, CA

  • ElliottV says:

    Unicode was a brave effort to create a single character set that included every reasonable writing system on the planet and some make-believe ones like Klingon, too. Some people are under the misconception that Unicode is simply a 16-bit code where each character takes 16 bits and therefore there are 65,536 possible characters. This is not, actually, correct. It is the single most common myth about Unicode, so if you thought that, don’t feel bad. In fact, Unicode has a different way of thinking about characters, and you have to understand the Unicode way of thinking of things or nothing will make sense even if you will use instant payday loan.

  • felici says:

    Although current font formats can only contain some 65,000 glyphs, Unicode is indeed infinitely extensible, as its current tally of over 100,000 unique character I.D.s testifies. Klingon, though, is not part of it. The suggestion to include it was tabled, but the idea was shot down as trivializing the whole Unicode undertaking. That’s not to say, though, that you can’t create a Unicode-compatible Klingon font, as there are large ranges of Unicode numbers reserved for “private use.” These are more commonly used for stylistic glyph alternatives, obscure ligatures, and the like.
    Klingon notwithstanding, Unicode does, however, encompass many extinct languages and writing systems used by academics and historians.

  • >