ed: Regular expressions

 
 5 Regular expressions
 *********************
 
 Regular expressions are patterns used in selecting text. For example, the
 'ed' command
 
      g/STRING/
 
 prints all lines containing STRING. Regular expressions are also used by
 the 's' command for selecting old text to be replaced with new text.
 
    In addition to specifying string literals, regular expressions can
 represent classes of strings. Strings thus represented are said to be
 matched by the corresponding regular expression. If it is possible for a
 regular expression to match several strings in a line, then the left-most
 match is the one selected. If the regular expression permits a variable
 number of matching characters, the longest sequence starting at that point
 is matched.
 
    An empty regular expression is equivalent to the last regular expression
 processed. Therefore '/RE/s//REPLACEMENT/' replaces RE with REPLACEMENT.
 
    As a GNU extension, a regular expression /RE/ may be followed by the
 suffix 'I' which makes 'ed' match RE in a case-insensitive manner. Note
 that the suffix is evaluated when the regular expression is compiled, thus
 it is invalid to specify it together with the empty regular expression.
 
    The following symbols are used in constructing regular expressions using
 POSIX basic regular expression syntax:
 
 'C'
      Any character C not listed below, including '{', '}', '(', ')', '<'
      and '>', matches itself.
 
 '\C'
      Any backslash-escaped character C, other than '{', '}', '(', ')', '<',
      '>', 'b', 'B', 'w', 'W', '+' and '?', matches itself.
 
 '.'
      Matches any single character.
 
 '[CHAR-CLASS]'
      Matches any single character in CHAR-CLASS. To include a ']' in
      CHAR-CLASS, it must be the first character. A range of characters may
      be specified by separating the end characters of the range with a '-',
      e.g., 'a-z' specifies the lower case characters. The following literal
      expressions can also be used in CHAR-CLASS to specify sets of
      characters:
 
           [:alnum:] [:cntrl:] [:lower:] [:space:]
           [:alpha:] [:digit:] [:print:] [:upper:]
           [:blank:] [:graph:] [:punct:] [:xdigit:]
 
      If '-' appears as the first or last character of CHAR-CLASS, then it
      matches itself. All other characters in CHAR-CLASS match themselves.
 
      Patterns in CHAR-CLASS of the form:
           [.COL-ELM.]
           [=COL-ELM=]
 
      where COL-ELM is a "collating element" are interpreted according to
      'locale (5)'. See 'regex (7)' for an explanation of these constructs.
 
 '[^CHAR-CLASS]'
      Matches any single character, other than newline, not in CHAR-CLASS.
      CHAR-CLASS is defined as above.
 
 '^'
      If '^' is the first character of a regular expression, then it anchors
      the regular expression to the beginning of a line. Otherwise, it
      matches itself.
 
 '$'
      If '$' is the last character of a regular expression, it anchors the
      regular expression to the end of a line. Otherwise, it matches itself.
 
 '\(RE\)'
      Defines a (possibly empty) subexpression RE. Subexpressions may be
      nested. A subsequent backreference of the form '\N', where N is a
      number in the range [1,9], expands to the text matched by the Nth
      subexpression. For example, the regular expression '\(a.c\)\1' matches
      the string 'abcabc', but not 'abcadc'. Subexpressions are ordered
      relative to their left delimiter.
 
 '*'
      Matches zero or more repetitions of the regular expression immediately
      preceding it. The regular expression can be either a single character
      regular expression or a subexpression. If '*' is the first character
      of a regular expression or subexpression, then it matches itself. The
      '*' operator sometimes yields unexpected results. For example, the
      regular expression 'b*' matches the beginning of the string 'abbb', as
      opposed to the substring 'bbb', since an empty string is the only
      left-most match.
 
 '\{N,M\}'
 '\{N,\}'
 '\{N\}'
      Matches the single character regular expression or subexpression
      immediately preceding it at least N and at most M times. If M is
      omitted, then it matches at least N times. If the comma is also
      omitted, then it matches exactly N times. If any of these forms occurs
      first in a regular expression or subexpression, then it is interpreted
      literally (i.e., the regular expression '\{2\}' matches the string
      '{2}', and so on).
 
 
    The following extensions to basic regular expression operators are
 preceded by a backslash '\' to distinguish them from traditional 'ed'
 syntax. They may be unavailable depending on the particular regex
 implementation in your system.
 
 '\<'
 '\>'
      Anchors the single character regular expression or subexpression
      immediately following it to the beginning (in the case of '\<') or
      ending (in the case of '\>') of a "word", i.e., in ASCII, a maximal
      string of alphanumeric characters, including the underscore (_).
 
 '\`'
 '\''
      Unconditionally matches the beginning '\`' or ending '\'' of a line.
 
 '\?'
      Optionally matches the single character regular expression or
      subexpression immediately preceding it. For example, the regular
      expression 'a[bd]\?c' matches the strings 'abc', 'adc' and 'ac'. If
      '\?' occurs at the beginning of a regular expressions or
      subexpression, then it matches a literal '?'.
 
 '\+'
      Matches the single character regular expression or subexpression
      immediately preceding it one or more times. So the regular expression
      'a\+' is shorthand for 'aa*'. If '\+' occurs at the beginning of a
      regular expression or subexpression, then it matches a literal '+'.
 
 '\b'
      Matches the beginning or ending (empty string) of a word. Thus the
      regular expression '\bhello\b' is equivalent to '\<hello\>'. However,
      '\b\b' is a valid regular expression whereas '\<\>' is not.
 
 '\B'
      Matches (an empty string) inside a word.
 
 '\w'
      Matches any word-constituent character (letters, digits, and the
      underscore).
 
 '\W'
      Matches any character that is not a word-constituent.