Info: (m4) Changeword

Info Catalog
m4: Changecom
m4: Input Control
m4: M4wrap
m4: Changeword

 
 8.4 Changing the lexical structure of words
 ===========================================
 
      The macro 'changeword' and all associated functionality is
      experimental.  It is only available if the '--enable-changeword'
      option was given to 'configure', at GNU 'm4' installation time.
      The functionality will go away in the future, to be replaced by
      other new features that are more efficient at providing the same
      capabilities.  _Do not rely on it_.  Please direct your comments
      about it the same way you would do for bugs.
 
    A file being processed by 'm4' is split into quoted strings, words
 (potential macro names) and simple tokens (any other single character).
 Initially a word is defined by the following regular expression:
 
      [_a-zA-Z][_a-zA-Z0-9]*
 
    Using 'changeword', you can change this regular expression:
 
  -- Optional builtin: changeword (REGEX)
      Changes the regular expression for recognizing macro names to be
      REGEX.  If REGEX is empty, use '[_a-zA-Z][_a-zA-Z0-9]*'.  REGEX
      must obey the constraint that every prefix of the desired final
      pattern is also accepted by the regular expression.  If REGEX
      contains grouping parentheses, the macro invoked is the portion
      that matched the first group, rather than the entire matching
      string.
 
      The expansion of 'changeword' is void.  The macro 'changeword' is
      recognized only with parameters.
 
    Relaxing the lexical rules of 'm4' might be useful (for example) if
 you wanted to apply translations to a file of numbers:
 
      ifdef(`changeword', `', `errprint(` skipping: no changeword support
      ')m4exit(`77')')dnl
      changeword(`[_a-zA-Z0-9]+')
      =>
      define(`1', `0')1
      =>0
 
    Tightening the lexical rules is less useful, because it will
 generally make some of the builtins unavailable.  You could use it to
 prevent accidental call of builtins, for example:
 
      ifdef(`changeword', `', `errprint(` skipping: no changeword support
      ')m4exit(`77')')dnl
      define(`_indir', defn(`indir'))
      =>
      changeword(`_[_a-zA-Z0-9]*')
      =>
      esyscmd(`foo')
      =>esyscmd(foo)
      _indir(`esyscmd', `echo hi')
      =>hi
      =>
 
    Because 'm4' constructs its words a character at a time, there is a
 restriction on the regular expressions that may be passed to
 'changeword'.  This is that if your regular expression accepts 'foo', it
 must also accept 'f' and 'fo'.
 
      ifdef(`changeword', `', `errprint(` skipping: no changeword support
      ')m4exit(`77')')dnl
      define(`foo
      ', `bar
      ')
      =>
      dnl This example wants to recognize changeword, dnl, and `foo\n'.
      dnl First, we check that our regexp will match.
      regexp(`changeword', `[cd][a-z]*\|foo[
      ]')
      =>0
      regexp(`foo
      ', `[cd][a-z]*\|foo[
      ]')
      =>0
      regexp(`f', `[cd][a-z]*\|foo[
      ]')
      =>-1
      foo
      =>foo
      changeword(`[cd][a-z]*\|foo[
      ]')
      =>
      dnl Even though `foo\n' matches, we forgot to allow `f'.
      foo
      =>foo
      changeword(`[cd][a-z]*\|fo*[
      ]?')
      =>
      dnl Now we can call `foo\n'.
      foo
      =>bar
 
    'changeword' has another function.  If the regular expression
 supplied contains any grouped subexpressions, then text outside the
 first of these is discarded before symbol lookup.  So:
 
      ifdef(`changeword', `', `errprint(` skipping: no changeword support
      ')m4exit(`77')')dnl
      ifdef(`__unix__', ,
            `errprint(` skipping: syscmd does not have unix semantics
      ')m4exit(`77')')dnl
      changecom(`/*', `*/')dnl
      define(`foo', `bar')dnl
      changeword(`#\([_a-zA-Z0-9]*\)')
      =>
      #esyscmd(`echo foo \#foo')
      =>foo bar
      =>
 
    'm4' now requires a '#' mark at the beginning of every macro
 invocation, so one can use 'm4' to preprocess plain text without losing
 various words like 'divert'.
 
    In 'm4', macro substitution is based on text, while in TeX, it is
 based on tokens.  'changeword' can throw this difference into relief.
 For example, here is the same idea represented in TeX and 'm4'.  First,
 the TeX version:
 
      \def\a{\message{Hello}}
      \catcode`\@=0
      \catcode`\\=12
      @a
      @bye
      =>Hello
 
 Then, the 'm4' version:
 
      ifdef(`changeword', `', `errprint(` skipping: no changeword support
      ')m4exit(`77')')dnl
      define(`a', `errprint(`Hello')')dnl
      changeword(`@\([_a-zA-Z0-9]*\)')
      =>
      @a
      =>errprint(Hello)
 
    In the TeX example, the first line defines a macro 'a' to print the
 message 'Hello'.  The second line defines <@> to be usable instead of
 <\> as an escape character.  The third line defines <\> to be a normal
 printing character, not an escape.  The fourth line invokes the macro
 'a'.  So, when TeX is run on this file, it displays the message 'Hello'.
 
    When the 'm4' example is passed through 'm4', it outputs
 'errprint(Hello)'.  The reason for this is that TeX does lexical
 analysis of macro definition when the macro is _defined_.  'm4' just
 stores the text, postponing the lexical analysis until the macro is
 _used_.
 
    You should note that using 'changeword' will slow 'm4' down by a
 factor of about seven, once it is changed to something other than the
 default regular expression.  You can invoke 'changeword' with the empty
 string to restore the default word definition, and regain the parsing
 speed.