Mkt1font, vpl2vpl and vpl2ovp: programs to generate accented fonts
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

                            John D. Smith
                            ~~~~~~~~~~~~~


These three programs, all made available under the GNU General Public
License, address the same requirement using the same basic algorithms:
their aim is to make it easy to create versions of existing fonts
containing whatever accented characters the user may need, arranged
according to whatever encoding he/she may favour. Mkt1font does this
by reading in the two files that define a Type 1 PostScript font and
writing out new versions of them; vpl2vpl does it by reading in the
file that defines a TeX virtual font and writing out a new version of
it; vpl2ovp does the same as vpl2vpl, but its output is a 16-bit
virtual font suitable for Omega, the Unicode-aware development from
TeX. In each case information about what accented characters are
required and where they should be located is supplied by means of a
simple definition file, which has the same format for both programs.

Note that most Type 1 PostScript fonts are commercial products
subject to both copyright and licensing restrictions. If the license
for such a font forbids reverse-engineering it, it is presumably
illegal to run mkt1font on it; in any case it would certainly be a
serious breach of copyright to make modified versions of the font
available to third parties. The author of mkt1font will not be held
responsible for any such misuses of the software. (No such legal
problems surround the use of TeX virtual fonts, which merely specify
new ways to use existing fonts.) However, would-be users may wish to
note the existence of a body of high-quality freeware lookalikes for
many of the standard Type 1 fonts: like mkt1font and vpl2vpl, the
Ghostscript fonts developed by URW++ Design and Development
Incorporated are made available under the GNU General Public License
(see the file COPYING), which permits them to be modified and
redistributed provided that certain simple decencies are observed.
These fonts are available from many ftp sites, including those
forming the Comprehensive TeX Archive Network (CTAN), for example
ftp.tex.ac.uk: look for the most recent font archive file in the
directory /tex-archive/support/ghostscript/gnu/fonts.

A detailed specification of the form of the definition file used by
mkt1font and vpl2vpl is given below in the instructions for using the
programs, but it may be helpful to give a quick idea of its contents
here. Each line of the file contains something similar to "128 x
acute", which is an instruction to create a new character formed from
"x" with an acute accent and located at position 128 in the character
set. Once a new character has been defined in this way, it "exists" in
just the same way as any other character, under the name "xacute" (or
whatever); thus it can even receive further accents. The two lines

   129   e macron
   130   emacron breve

will create a "long e" and a "long-or-short e" at the positions
specified. (If only the latter is desired, they can both be assigned
the same position in the encoding -- the second will overwrite the
first.) Subscript as well as superscript accents are available. Some
sample definition files are included to give an idea of the format:
note that two of these (CSX.def and Norman.def), used for Indian
language material in Roman script, contain the triple-accented
character "runderdotmacron acute"!

In addition to accented characters proper, both programs can produce
digraphs -- single characters composed of pairs of other characters.
In itself this may seem to be of limited use, since the digraph
should always be indistinguishable from its two constituent
characters printed consecutively. However, some Unicode-style
encodings require that certain character-pairs be treated as
digraphs. In addition, use of a digraph in subsequent definitions
allows an accent to be applied to two characters as a pair: so, for
example,

   131   k h
   131   kh underbar

will produce a character consisting of an underlined "kh" at
position 131.

The algorithms used in both programs take great care to secure the
most attractive results, placing accents centrally over/under their
characters at appropriate heights/depths, varying their position
according to the angle of slant in Italic and similar fonts,
providing new characters with kerning information (which will be
resolutely ignored by most word-processors), and -- in the case of
mkt1font -- giving hinting instructions to improve output on
low-resolution printers. However, users may feel that some
characters need "tweaking". This tends to happen when the part of
the character adjacent to the accent is highly asymmetrical:
subscript accents below "r" may need moving to the left, while
superscript accents above "d" may need moving to the right. It is
relatively easy to modify the output of vpl2vpl to make such
improvements by hand (vpl files are designed to be easy for humans
to read, and vpl2vpl identifies each character with a comment). Type
1 fonts are another matter; mkt1font generates a human-readable and
commented ".dis" file as well as the ".pfb" font file, and it is
possible to make changes to this and rebuild the pfb file with t1asm
(see below), but this is not quite as simple as modifying a TeX
virtual font.

For details on how vpl2ovp handles access to accented characters by
means of a system of ligatures, see further the CHANGES file.

The programs are written in Perl, and have not been tested on
systems other than Unix. (Apart from the #! line at the top, the
only Unix specificities the author is aware of are two references to
"/dev/null" in mkt1font and the general assumption that filenames
can be of any length.) At run time, mkt1font invokes two further
font utilities, t1asm and t1disasm, written by I. Lee Hetherington
and available from CTAN ftp sites: look in the directory
/tex-archive/fonts/utilities/t1utils.

Detailed instructions for using the programs follow. In each case
these instructions can also be seen by typing the name of the program
with a "-h" option.


Mkt1font
~~~~~~~~
Syntax: mkt1font -n fontname -d definition-file -a afm-file -f font-file
           [-s shrink-factor] [-c candrabindu-adjustment] [-b]

Mkt1font creates new Type 1 PostScript fonts based on existing fonts
("input fonts"). In order to do this it makes use of I. Lee
Hetherington's programs t1asm and t1disasm, which must be present on
the system. A successful run will generate an AFM file and a PFB
file, as well as a DIS file containing a disassembled version of the
PFB file (including useful comments). Four options must be specified
on the command line, as follows:

  -n should name the font it is intended to generate. Avoid names
     such as "myfont", as both mkt1font and other programs attempt
     to draw conclusions from the name; better would be something
     like "Utopia_French-BoldItalic". The generated font files
     will also use this for their basename.

  -d should refer to a font definition file. This file (which could
     usefully be named, e.g., "French.def") should consist of
     lines of character definitions, in the form
               "number"   "character"
     or
               "number"   "character"   "accent"
     Here "number" represents the character's position in the new
     encoding and may be expressed in decimal, octal or hex;
     "character" names the character (e.g. "comma", "eight",
     "A") or consists of the word ".notdef" (indicating that
     the specified number's "slot" in the new encoding is to be
     empty); and "accent" optionally names an accent to be placed
     on the character. In addition to the standard accents available
     in PostScript fonts, "underbar" and "underdot" are also
     available, as are "under" versions of all the normal
     superscript accents ("underdieresis", "underring", etc.).
     The Indian accent "candrabindu" may also be specified: it
     is formed by overprinting a breve with a dotaccent. Finally,
     "overdot" may be used as a synonym for "dotaccent".

     If the character named in the "accent" position is not in fact
     a valid accent character, the program interprets the definition
     as a request for a digraph formed from the "character" and the
     "accent". A digraph consisting of, say, "k" and "h" will be
     indistinguishable from the letters "k" and "h" printed
     consecutively, but the digraph "kh" can itself receive accents
     like any other character: see next paragraph.

     A new character (such as "amacron" or "kh") may be freely
     used in the "character" position of a further definition (such
     as "amacron breve" or "kh underbar"). There is no constraint
     on the ordering of definitions within a definition file. The
     definition of "a macron" does not have to precede that of
     "amacron breve": requests for "impossible" characters are
     deferred until their constituents have had a chance to come into
     being.

     "Slots" for which no new definition is given retain the
     definition they have in the input font.

     The definition file may also contain blank lines and comments
     (introduced by "#").

  -a should refer to the AFM (Adobe Font Metrics) file for the input
     font.

  -f should refer to the binary font file (PFB) for the input font.
     (In fact the equivalent ASCII file (PFA) is also acceptable for
     input, but the output font is always in PFB format.)

  -s may optionally give the factor, expressed as a per-thousand
     value, by which normally superscript accents (such as dieresis,
     ring) should be shrunk when they are used as subscript accents
     (such as underdieresis, underring). Values of around 800 may be
     found useful.

  -c may optionally give two comma-separated numerical values to
     adjust the x and y coordinates of the dotaccent placed within a
     breve to form the candrabindu accent.

  -b may optionally be specified to block the use of predefined
     accented characters, forcing mkt1font to define its own
     versions. This may be useful to secure a consistent appearance
     in cases where a font designer does not share mkt1font's views
     on where accents should be placed.

The -h option prints this help.


Vpl2vpl
~~~~~~~
Syntax: vpl2vpl -d definition-file [-s shrink-factor]
           [-c candrabindu-adjustment] [-b] vpl-file

Vpl2vpl creates new TeX virtual fonts based on existing fonts or
virtual fonts ("input fonts"). A successful run will read a pl
(Property List) or vpl (Virtual Property List) file and a definition
file, and will generate a new vpl (Virtual Property List) file on
standard output. The input font is assumed to adhere to the standard
TeX encoding for text fonts unless it was created with either of the
programs afm2pl or afm2tfm, in which case it is assumed to conform to
(respectively) the Adobe Standard Encoding or the encoding specified
in the file dvips.enc. In either case, the name of the input font is
assumed to be the name of the input file without its .vpl or .pl
extension: it must conform to normal TeX conventions for naming fonts,
as vpl2vpl attempts to draw conclusions from it about the kind of font
it is dealing with.

A typical complete sequence of commands to create a new virtual
font might therefore be
     tftopl cmr10.tfm cmr10.pl
     vpl2vpl -d ISO-Latin1.def cmr10.pl >cmr10_isol1.vpl
     vptovf cmr10_isol1.vpl cmr10_isol1.vf cmr10_isol1.tfm
for a Computer Modern font, or
     afm2pl Times-Roman.afm rptmr.pl
     pltotf rptmr.pl rptmr.tfm
     vpl2vpl -d ISO-Latin1.def rptmr.pl >ptmr-isol1.vpl
     vptovf ptmr-isol1.vpl ptmr-isol1.vf ptmr-isol1.tfm
for a PostScript font.

Another approach for a PostScript font is to use afm2tfm:
     afm2tfm Times-Roman.afm -t dvips.enc -v ptmr rptmr
     vpl2vpl -d ISO-Latin1.def ptmr.vpl >ptmr-isol1.vpl
     vptovf ptmr-isol1.vpl ptmr-isol1.vf ptmr-isol1.tfm
-- but this is now deprecated, as afm2tfm generates incorrect
values for the heights of some characters, and this can lead to
bad accent placing.

In order to keep the whole upper half of the character set free for
the requirements of the encoding specified in the definition file,
certain modifications are made to input fonts following the
dvips.enc encoding to bring them into greater conformity with the
TeX norm. In particular, the characters dotaccent and hungarumlaut
are placed in the positions assigned by TeX ("5F, "7D), not those
enforced by dvips.enc ("C7, "CD). The f-ligatures, double quotes
and dashes are also moved from the upper half of the character set
to their normal TeX positions. As a result, the following characters
are not found in the lower half of the character set: quotesingle,
quotedbl, backslash, underscore, braceleft, bar, braceright. These
characters can, however, be assigned positions in the output font if
they are needed. (Indeed, they could all be explicitly restored to
their dvips.enc positions if this were desired.)

Options:

  -d should refer to a font definition file. This file (which could
     usefully be named, e.g., "French.def") should consist of
     lines of character definitions, in the form
               "number"   "character"
     or
               "number"   "character"   "accent"
     Here "number" represents the character's position in the new
     encoding and may be expressed in decimal, octal or hex;
     "character" names the character (e.g. "comma", "eight",
     "A") or consists of the word ".notdef" (indicating that
     the specified number's "slot" in the new encoding is to be
     empty); and "accent" optionally names an accent to be placed
     on the character. In addition to the standard accents available
     in PostScript fonts, "underbar" and "underdot" are also
     available, as are "under" versions of all the normal
     superscript accents ("underdieresis", "underring", etc.).
     The Indian accent "candrabindu" may also be specified: it
     is formed by overprinting a breve with a dotaccent. Finally,
     "overdot" may be used as a synonym for "dotaccent".

     If the character named in the "accent" position is not in fact
     a valid accent character, the program interprets the definition
     as a request for a digraph formed from the "character" and the
     "accent". A digraph consisting of, say, "k" and "h" will be
     indistinguishable from the letters "k" and "h" printed
     consecutively, but the digraph "kh" can itself receive accents
     like any other character: see next paragraph.

     A new character (such as "amacron" or "kh") may be freely
     used in the "character" position of a further definition (such
     as "amacron breve" or "kh underbar"). There is no constraint
     on the ordering of definitions within a definition file. The
     definition of "a macron" does not have to precede that of
     "amacron breve": requests for "impossible" characters are
     deferred until their constituents have had a chance to come into
     being.

     "Slots" for which no new definition is given retain the
     definition they have in the input font.

     The definition file may also contain blank lines and comments
     (introduced by "#").

  -s may optionally give the factor, expressed as a per-thousand
     value, by which normally superscript accents (such as dieresis,
     ring) should be shrunk when they are used as subscript accents
     (such as underdieresis, underring). Values of around 800 may be
     found useful.

  -c may optionally give two comma-separated numerical values to
     adjust the x and y coordinates of the dotaccent placed within a
     breve to form the candrabindu accent. A coordinate scheme using
     "DESIGNUNITS R 1000" is assumed.

  -b may optionally be specified to block the use of predefined
     accented characters, forcing vpl2vpl to define its own
     versions. This may be useful to secure a consistent appearance
     in cases where a font designer does not share vpl2vpl's views
     on where accents should be placed.

  -h prints this help.


Vpl2ovp
~~~~~~~

Syntax: vpl2ovp -d definition-file [-s shrink-factor]
           [-c candrabindu-adjustment] [-b] vpl-file

Vpl2ovp creates new Omega virtual fonts based on existing TeX
fonts or virtual fonts ("input fonts"). A successful run will
read a pl (Property List) or vpl (Virtual Property List) file
and a definition file, and will generate a new ovp (Omega
Virtual Property List) file on standard output. The input font
is assumed to adhere to the standard TeX encoding for text
fonts unless it was created with either of the programs afm2pl
or afm2tfm, in which case it is assumed to conform to
(respectively) the Adobe Standard Encoding or the encoding
specified in the file dvips.enc. In either case, the name of
the input font is assumed to be the name of the input file
without its .vpl or .pl extension: it must conform to normal
TeX conventions for naming fonts, as vpl2ovp attempts to draw
conclusions from it about the kind of font it is dealing with.

A typical complete sequence of commands to create a new virtual
font might therefore be
     tftopl cmr10.tfm cmr10.pl
     vpl2ovp -d Unicode1.def cmr10.pl >cmr10-uni1.ovp
     ovp2ovf cmr10-uni1.ovp cmr10-uni1.ovf cmr10-uni1.ofm
for a Computer Modern font, or
     afm2pl Times-Roman.afm rptmr.pl
     pltotf rptmr.pl rptmr.tfm
     vpl2ovp -d Unicode1.def rptmr.pl >ptmr-uni1.ovp
     ovp2ovf ptmr-uni1.ovp ptmr-uni1.ovf ptmr-uni1.ofm
for a PostScript font.

Another approach for a PostScript font is to use afm2tfm:
     afm2tfm Times-Roman.afm -t dvips.enc -v ptmr rptmr
     vpl2ovp -d Unicode1.def ptmr.vpl >ptmr-uni1.ovp
     ovp2ovf ptmr-uni1.ovp ptmr-uni1.ovf ptmr-uni1.ofm
-- but this is now deprecated, as afm2tfm generates incorrect
values for the heights of some characters, and this can lead to
bad accent placing.

In order to keep the whole of the character range "F0-"FF free for
the requirements of the encoding specified in the definition file,
certain modifications are made to input fonts following the dvips.enc
encoding to bring them into greater conformity with the TeX norm. In
particular, the characters dotaccent and hungarumlaut are placed in
the positions assigned by TeX ("5F, "7D), not those enforced by
dvips.enc ("C7, "CD). The f-ligatures, double quotes and dashes are
also moved from the upper half of the original 8-bit character set to
their normal TeX positions. As a result, the following characters are
not found in the lower half of the character set: quotesingle,
quotedbl, backslash, underscore, braceleft, bar, braceright. These
characters can, however, be assigned positions in the output font if
they are needed. (Indeed, they could all be explicitly restored to
their dvips.enc positions if this were desired.)

Options:

  -d should refer to a font definition file. This file (which could
     usefully be named, e.g., "Unicode1.def") should consist of
     lines of character definitions, in the form
               "number"   "character"
     or
               "number"   "character"   "accent"
     Here "number" represents the character's position in the new
     encoding and may be expressed in decimal, octal or hex;
     "character" names the character (e.g. "comma", "eight",
     "A") or consists of the word ".notdef" (indicating that
     the specified number's "slot" in the new encoding is to be
     empty); and "accent" optionally names an accent to be placed
     on the character. In addition to the standard accents available
     in PostScript fonts, "underbar" and "underdot" are also
     available, as are "under" versions of all the normal
     superscript accents ("underdieresis", "underring", etc.).
     The Indian accent "candrabindu" may also be specified: it
     is formed by overprinting a breve with a dotaccent. Finally,
     "overdot" may be used as a synonym for "dotaccent".

     Note that all accents used in defining accented characters must
     themselves be defined in the .def file. Those which exist in the
     source font should simply be referenced by name in their
     appropriate Unicode position (e.g. "0x0304 macron"); those
     which do not should be defined as the character "space"
     followed by the name of the accent (e.g. "0x0310 space
     candrabindu").

     If the character named in the "accent" position is not in fact
     a valid accent character, the program interprets the definition
     as a request for a digraph formed from the "character" and the
     "accent". A digraph consisting of, say, "k" and "h" will be
     indistinguishable from the letters "k" and "h" printed
     consecutively, but the digraph "kh" can itself receive accents
     like any other character: see next paragraph.

     A new character (such as "amacron" or "kh") may be freely
     used in the "character" position of a further definition (such
     as "amacron breve" or "kh underbar"). There is no constraint
     on the ordering of definitions within a definition file. The
     definition of "a macron" does not have to precede that of
     "amacron breve": requests for "impossible" characters are
     deferred until their constituents have had a chance to come into
     being.

     "Slots" for which no new definition is given retain the
     definition they have in the input font.

     The definition file may also contain blank lines and comments
     (introduced by "#").

  -s may optionally give the factor, expressed as a per-thousand
     value, by which normally superscript accents (such as dieresis,
     ring) should be shrunk when they are used as subscript accents
     (such as underdieresis, underring). Values of around 800 may be
     found useful.

  -c may optionally give two comma-separated numerical values to
     adjust the x and y coordinates of the dotaccent placed within a
     breve to form the candrabindu accent. A coordinate scheme using
     "DESIGNUNITS R 1000" is assumed.

  -b may optionally be specified to block the use of predefined
     accented characters, forcing vpl2ovp to define its own
     versions. This may be useful to secure a consistent appearance
     in cases where a font designer does not share vpl2ovp's views
     on where accents should be placed.

  -h prints this help.


John D. Smith
jds10@cam.ac.uk
http://bombay.indology.info