groff: Manipulating Hyphenation
5.8 Manipulating Hyphenation
============================
Here a description of requests that influence hyphenation.
-- Request: .hy [mode]
-- Register: \n[.hy]
Enable hyphenation. The request has an optional numeric argument,
MODE, to restrict hyphenation if necessary:
'1'
The default argument if MODE is omitted: hyphenation is
enabled, and the first and the last characters of a word are
not hyphenated. This is also the start-up value of 'gtroff'.
'2'
Do not hyphenate the last word on a page or column.
'4'
Do not hyphenate the last two characters of a word.
'8'
Do not hyphenate the first two characters of a word.
'16'
Allow hyphenation before the last character of a word.
'32'
Allow hyphenation after the first character of a word.
The values in the previous table are additive. For example,
value 12 causes 'gtroff' to neither hyphenate the last two nor the
first two characters of a word. Note that value 13 would do
exactly the same; in other words, value 1 need not be added if the
value is larger than 1.
Some values cannot be used together because they contradict; for
instance, values 4 and 16, and values 8 and 32.
The number of characters at the beginning of a word after which the
first hyphenation point should be inserted is determined by the
patterns themselves; it can't be reduced further without
introducing additional, invalid hyphenation points (unfortunately,
this information is not part of a pattern file, you have to know it
in advance). The same is true for the number of characters at the
end of word before the last hyphenation point should be inserted.
For example, the code
.ll 1
.hy 48
splitting
returns
s-
plit-
t-
in-
g
instead of the correct 'split-ting'. US-English patterns as
distributed with groff need two characters at the beginning and
three characters at the end; this means that value 4 of 'hy' is
mandatory. Value 8 is possible as an additional restriction, but
values 1 (the default!), 16, and 32 should be avoided.
Here is a table of left and right minimum values for hyphenation as
needed by the patterns distributed with groff; see the
'groff_tmac(5) man page' (type 'man groff_tmac' at the command
line) for more information on groff's language macro files.
language pattern name left min right min
-----------------------------------------------------------
Czech cs 2 2
US English us 2 3
French fr 2 3
German traditional det 2 2
German reformed den 2 2
Swedish sv 1 2
Hyphenation exceptions within pattern files (i.e., the words within
a '\hyphenation' group) also obey the hyphenation restrictions
given by 'hy'. However, exceptions specified with the 'hw' do not.
The current hyphenation restrictions can be found in the read-only
number register '.hy'.
The hyphenation mode is associated with the current environment
(⇒Environments).
-- Request: .nh
Disable hyphenation (i.e., set the hyphenation mode to zero). Note
that the hyphenation mode of the last call to 'hy' is not
remembered.
The hyphenation mode is associated with the current environment
(⇒Environments).
-- Request: .hlm [nnn]
-- Register: \n[.hlm]
-- Register: \n[.hlc]
Set the maximum number of consecutive hyphenated lines to NNN. If
this number is negative, there is no maximum. The default value
is -1 if NNN is omitted. This value is associated with the current
environment (⇒Environments). Only lines output from a given
environment count towards the maximum associated with that
environment. Hyphens resulting from '\%' are counted; explicit
hyphens are not.
The current setting of 'hlm' is available in the '.hlm' read-only
number register. Also the number of immediately preceding
consecutive hyphenated lines are available in the read-only number
register '.hlc'.
-- Request: .hw word1 word2 ...
Define how WORD1, WORD2, etc. are to be hyphenated. The words must
be given with hyphens at the hyphenation points. For example:
.hw in-sa-lub-rious
Besides the space character, any character whose hyphenation code
value is zero can be used to separate the arguments of 'hw' (see
the documentation for the 'hcode' request below for more
information). In addition, this request can be used more than
once.
Hyphenation points specified with 'hw' are not subject to the
restrictions given by the 'hy' request.
Hyphenation exceptions specified with the 'hw' request are
associated with the current hyphenation language; it causes an
error if there is no current hyphenation language.
This request is ignored if there is no parameter.
In old versions of 'troff' there was a limited amount of space to
store such information; fortunately, with 'gtroff', this is no
longer a restriction.
-- Escape: \%
-- Escape: \:
To tell 'gtroff' how to hyphenate words on the fly, use the '\%'
escape, also known as the "hyphenation character". Preceding a
word with this character prevents it from being hyphenated; putting
it inside a word indicates to 'gtroff' that the word may be
hyphenated at that point. Note that this mechanism only affects
that one occurrence of the word; to change the hyphenation of a
word for the entire document, use the 'hw' request.
The '\:' escape inserts a zero-width break point (that is, the word
breaks but without adding a hyphen).
... check the /var/log/\:httpd/\:access_log file ...
Note that '\X' and '\Y' start a word, that is, the '\%' escape in
(say) '\X'...'\%foobar' and '\Y'...'\%foobar' no longer prevents
hyphenation but inserts a hyphenation point at the beginning of
'foobar'; most likely this isn't what you want to do.
-- Request: .hc [char]
Change the hyphenation character to CHAR. This character then
works the same as the '\%' escape, and thus, no longer appears in
the output. Without an argument, 'hc' resets the hyphenation
character to be '\%' (the default) only.
The hyphenation character is associated with the current
environment (⇒Environments).
-- Request: .hpf pattern_file
-- Request: .hpfa pattern_file
-- Request: .hpfcode a b [c d ...]
Read in a file of hyphenation patterns. This file is searched for
in the same way as 'NAME.tmac' (or 'tmac.NAME') is searched for if
the '-mNAME' option is specified.
It should have the same format as (simple) TeX patterns files.
More specifically, the following scanning rules are implemented.
* A percent sign starts a comment (up to the end of the line)
even if preceded by a backslash.
* No support for 'digraphs' like '\$'.
* '^^XX' (X is 0-9 or a-f) and '^^X' (character code of X in the
range 0-127) are recognized; other use of '^' causes an error.
* No macro expansion.
* 'hpf' checks for the expression '\patterns{...}' (possibly
with whitespace before and after the braces). Everything
between the braces is taken as hyphenation patterns.
Consequently, '{' and '}' are not allowed in patterns.
* Similarly, '\hyphenation{...}' gives a list of hyphenation
exceptions.
* '\endinput' is recognized also.
* For backwards compatibility, if '\patterns' is missing, the
whole file is treated as a list of hyphenation patterns (only
recognizing the '%' character as the start of a comment).
If no 'hpf' request is specified (either in the document or in a
macro package), 'gtroff' won't hyphenate at all.
The 'hpfa' request appends a file of patterns to the current list.
The 'hpfcode' request defines mapping values for character codes in
hyphenation patterns. 'hpf' or 'hpfa' then apply the mapping
(after reading the patterns) before replacing or appending them to
the current list of patterns. Its arguments are pairs of character
codes - integers from 0 to 255. The request maps character code A
to code B, code C to code D, and so on. You can use character
codes that would be invalid otherwise. By default, everything maps
to itself except letters 'A' to 'Z', which map to 'a' to 'z'.
The set of hyphenation patterns is associated with the current
language set by the 'hla' request. The 'hpf' request is usually
invoked by the 'troffrc' or 'troffrc-end' file; by default,
'troffrc' loads hyphenation patterns and exceptions for American
English (in files 'hyphen.us' and 'hyphenex.us').
A second call to 'hpf' (for the same language) replaces the
hyphenation patterns with the new ones.
Invoking 'hpf' causes an error if there is no current hyphenation
language.
-- Request: .hcode c1 code1 [c2 code2 ...]
Set the hyphenation code of character C1 to CODE1, that of C2 to
CODE2, etc. A hyphenation code must be a single input character
(not a special character) other than a digit or a space.
To make hyphenation work, hyphenation codes must be set up. At
start-up, groff only assigns hyphenation codes to the letters
'a'-'z' (mapped to themselves) and to the letters 'A'-'Z' (mapped
to 'a'-'z'); all other hyphenation codes are set to zero.
Normally, hyphenation patterns contain only lowercase letters,
which should be applied regardless of case. In other words, the
words 'FOO' and 'Foo' should be hyphenated exactly the same way as
the word 'foo' is hyphenated, and this is what 'hcode' is good for.
Words that contain other letters won't be hyphenated properly if
the corresponding hyphenation patterns actually do contain them.
For example, the following 'hcode' requests are necessary to assign
hyphenation codes to the letters 'ÄäÖöÜüß' (this is needed for
German):
.hcode ä ä Ä ä
.hcode ö ö Ö ö
.hcode ü ü Ü ü
.hcode ß ß
Without those assignments, groff treats German words like
'Kindergärten' (the plural form of 'kindergarten') as two
substrings 'kinderg' and 'rten' because the hyphenation code of the
umlaut a is zero by default. There is a German hyphenation pattern
that covers 'kinder', so groff finds the hyphenation 'kin-der'.
The other two hyphenation points ('kin-der-gär-ten') are missed.
This request is ignored if it has no parameter.
-- Request: .hym [length]
-- Register: \n[.hym]
Set the (right) hyphenation margin to LENGTH. If the current
adjustment mode is not 'b' or 'n', the line is not hyphenated if it
is shorter than LENGTH. Without an argument, the hyphenation
margin is reset to its default value, which is 0. The default
scaling indicator for this request is 'm'. The hyphenation margin
is associated with the current environment (⇒Environments).
A negative argument resets the hyphenation margin to zero, emitting
a warning of type 'range'.
The current hyphenation margin is available in the '.hym' read-only
number register.
-- Request: .hys [hyphenation_space]
-- Register: \n[.hys]
Set the hyphenation space to HYPHENATION_SPACE. If the current
adjustment mode is 'b' or 'n', don't hyphenate the line if it can
be justified by adding no more than HYPHENATION_SPACE extra space
to each word space. Without argument, the hyphenation space is set
to its default value, which is 0. The default scaling indicator
for this request is 'm'. The hyphenation space is associated with
the current environment (⇒Environments).
A negative argument resets the hyphenation space to zero, emitting
a warning of type 'range'.
The current hyphenation space is available in the '.hys' read-only
number register.
-- Request: .shc [glyph]
Set the "soft hyphen character" to GLYPH.(1) (⇒Manipulating
Hyphenation-Footnote-1) If the argument is omitted, the soft
hyphen character is set to the default glyph '\(hy' (this is the
start-up value of 'gtroff' also). The soft hyphen character is the
glyph that is inserted when a word is hyphenated at a line break.
If the soft hyphen character does not exist in the font of the
character immediately preceding a potential break point, then the
line is not broken at that point. Neither definitions (specified
with the 'char' request) nor translations (specified with the 'tr'
request) are considered when finding the soft hyphen character.
-- Request: .hla language
-- Register: \n[.hla]
Set the current hyphenation language to the string LANGUAGE.
Hyphenation exceptions specified with the 'hw' request and
hyphenation patterns specified with the 'hpf' and 'hpfa' requests
are both associated with the current hyphenation language. The
'hla' request is usually invoked by the 'troffrc' or the
'troffrc-end' files; 'troffrc' sets the default language to 'us'.
The current hyphenation language is available as a string in the
read-only number register '.hla'.
.ds curr_language \n[.hla]
\*[curr_language]
=> us