• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Font Primer

This version was saved 16 years, 1 month ago View current version     Page history
Saved by PBworks
on March 20, 2008 at 7:13:49 am

Character set primer


Terminology (these definitions apply to your computing environment):


  • A font can be defined as "A design for a set of characters." A font is the combination of typeface and other qualities, such as size, pitch, and spacing. For example, Times Roman is a typeface that defines the shape of each character. Within Times Roman, however, there are many fonts to choose from - different sizes, italic, bold, and so on. The term font is usually used (incorrectly) as a synonym for typeface -- more about fonts and typefaces here.


  • A character set can be defined as "The entire complement of alphanumeric and other symbols contained in a given font." Another way to look at it is that the typeface/font you choose determines what specific characters look like, e.g., the design, the version (Regular, Bold, etc), but the character set you choose determines what characters are available to you. While most character sets have the letters and numbers we are used to, some include those funny IPA characters we want to make, and some do not - and some character sets only include a subset of the whole IPA character set, so you'll want to be careful about what character set you use!


    Here is a site with an excellent list of some common character sets, along with their limitations and uses. The three that most people need to be aware of are ISO 8859-x, used on most versions of Windows, and UTF-8 (single-byte Unicode) and UTF-16 (two-byte Unicode). What you will need in order to be able to view and type all IPA characters is either UTF-8 or UTF-16, although some people can skimp by on ISO 8859-x if they don't need more than the most common IPA characters. UTF-8 is adequate for every possible IPA character. UTF-16 is required in order to use a keyboard to type in Asian languages such as Chinese or Japanese. Linux/Unix are set gloablly to UTF-8 by default and Macs are set globally to UTF-16 by default. Windows is usually set to 8859-x by default. In "8859-x" the "x" refers to a country or language code. For example, if your computer and Windows are set to "United States - English" your character set will be 8859-1.






What Unicode is all about



Unicode is a system that seeks to name all glyphs (characters, numerals, punctuation, etc) that humans use.   Imagine how many thousands of human languages there are and how many hundreds of characters each language needs for its orthography... and then there's math.  Unicode seeks to make one system that has a discrete name (alpha-numeric value) for every character any human could ever want to use.  Before Unicode many different naming systems (encodings) were used independent of one another.  These systems functioned well within the language or region they were intended for, but as the internet developed and globalization brought as divergent languages as Urdu and Basque into digital contact with one another, some system was needed to unify all the writing systems.  Unicode is what enables you to browser the web and see Japanese Kanji characters on the same page as Russian, and your computer doesn't miss a beat.


Unicode itself is designed with characters divided into blocks to make it easy to find characters. At least, that was the plan. In reality it can be a pain because the blocks are not very alphabetical. Here are the first few blocks:


LATIN-1 through LATIN-10: Pretty much the characters you see on the keyboard in countries that speak Indo-European languages. Note that Latin includes characters with diacritics that are used in French, German, etc., even though those are not standard on the keyboard in all places where Indo-European langages are spoken.

LATIN-1 SUPPLEMENT - Includes characters for additional Indo-European languages beyond Germanic and Romance languages.

LATIN EXTENDED - A and B - Even more characters for Indo-European languages, including some that we think of as IPA.

IPA EXTENSIONS - These are your basic IPA set, but includes only those IPA characters beyond the characters already included in the above character sets.

SPACING MODIFIER LETTERS - A lot of characters that we use occasionally in IPA. Characters in this set are "spacing characters," that is, they occupy space on the line the same as regular characters (see COMBINING DIACRITICAL MARKS below for the contrast).

COMBINING DIACRITICAL MARKS - These are diacriticals that do not occupy a space on the line. Here is where you will find characters like the diacritics for voiceless, dental, syllabic, and so on. When you type one of these characters it places the diacritic on top of the preceding character. This set is one that is essential for linguists.


There are lots of other Unicode blocks (e.g., specific blocks for specific languages), but the above are what linguists writing in most European languages will need.







Comments (0)

You don't have permission to comment on this page.