gettext: General Problems

 
 15.5.21.1 General Problems Parsing Perl Code
 ............................................
 
    It is often heard that only Perl can parse Perl.  This is not true.
 Perl cannot be _parsed_ at all, it can only be _executed_.  Perl has
 various built-in ambiguities that can only be resolved at runtime.
 
    The following example may illustrate one common problem:
 
      print gettext "Hello World!";
 
    Although this example looks like a bullet-proof case of a function
 invocation, it is not:
 
      open gettext, ">testfile" or die;
      print gettext "Hello world!"
 
    In this context, the string ‘gettext’ looks more like a file handle.
 But not necessarily:
 
      use Locale::Messages qw (:libintl_h);
      open gettext ">testfile" or die;
      print gettext "Hello world!";
 
    Now, the file is probably syntactically incorrect, provided that the
 module ‘Locale::Messages’ found first in the Perl include path exports a
 function ‘gettext’.  But what if the module ‘Locale::Messages’ really
 looks like this?
 
      use vars qw (*gettext);
 
      1;
 
    In this case, the string ‘gettext’ will be interpreted as a file
 handle again, and the above example will create a file ‘testfile’ and
 write the string “Hello world!” into it.  Even advanced control flow
 analysis will not really help:
 
      if (0.5 < rand) {
         eval "use Sane";
      } else {
         eval "use InSane";
      }
      print gettext "Hello world!";
 
    If the module ‘Sane’ exports a function ‘gettext’ that does what we
 expect, and the module ‘InSane’ opens a file for writing and associates
 the _handle_ ‘gettext’ with this output stream, we are clueless again
 about what will happen at runtime.  It is completely unpredictable.  The
 truth is that Perl has so many ways to fill its symbol table at runtime
 that it is impossible to interpret a particular piece of code without
 executing it.
 
    Of course, ‘xgettext’ will not execute your Perl sources while
 scanning for translatable strings, but rather use heuristics in order to
 guess what you meant.
 
    Another problem is the ambiguity of the slash and the question mark.
 Their interpretation depends on the context:
 
      # A pattern match.
      print "OK\n" if /foobar/;
 
      # A division.
      print 1 / 2;
 
      # Another pattern match.
      print "OK\n" if ?foobar?;
 
      # Conditional.
      print $x ? "foo" : "bar";
 
    The slash may either act as the division operator or introduce a
 pattern match, whereas the question mark may act as the ternary
 conditional operator or as a pattern match, too.  Other programming
 languages like ‘awk’ present similar problems, but the consequences of a
 misinterpretation are particularly nasty with Perl sources.  In ‘awk’
 for instance, a statement can never exceed one line and the parser can
 recover from a parsing error at the next newline and interpret the rest
 of the input stream correctly.  Perl is different, as a pattern match is
 terminated by the next appearance of the delimiter (the slash or the
 question mark) in the input stream, regardless of the semantic context.
 If a slash is really a division sign but mis-interpreted as a pattern
 match, the rest of the input file is most probably parsed incorrectly.
 
    There are certain cases, where the ambiguity cannot be resolved at
 all:
 
      $x = wantarray ? 1 : 0;
 
    The Perl built-in function ‘wantarray’ does not accept any arguments.
 The Perl parser therefore knows that the question mark does not start a
 regular expression but is the ternary conditional operator.
 
      sub wantarrays {}
      $x = wantarrays ? 1 : 0;
 
    Now the situation is different.  The function ‘wantarrays’ takes a
 variable number of arguments (like any non-prototyped Perl function).
 The question mark is now the delimiter of a pattern match, and hence the
 piece of code does not compile.
 
      sub wantarrays() {}
      $x = wantarrays ? 1 : 0;
 
    Now the function is prototyped, Perl knows that it does not accept
 any arguments, and the question mark is therefore interpreted as the
 ternaray operator again.  But that unfortunately outsmarts ‘xgettext’.
 
    The Perl parser in ‘xgettext’ cannot know whether a function has a
 prototype and what that prototype would look like.  It therefore makes
 an educated guess.  If a function is known to be a Perl built-in and
 this function does not accept any arguments, a following question mark
 or slash is treated as an operator, otherwise as the delimiter of a
 following regular expression.  The Perl built-ins that do not accept
 arguments are ‘wantarray’, ‘fork’, ‘time’, ‘times’, ‘getlogin’,
 ‘getppid’, ‘getpwent’, ‘getgrent’, ‘gethostent’, ‘getnetent’,
 ‘getprotoent’, ‘getservent’, ‘setpwent’, ‘setgrent’, ‘endpwent’,
 ‘endgrent’, ‘endhostent’, ‘endnetent’, ‘endprotoent’, and ‘endservent’.
 
    If you find that ‘xgettext’ fails to extract strings from portions of
 your sources, you should therefore look out for slashes and/or question
 marks preceding these sections.  You may have come across a bug in
 ‘xgettext’’s Perl parser (and of course you should report that bug).  In
 the meantime you should consider to reformulate your code in a manner
 less challenging to ‘xgettext’.
 
    In particular, if the parser is too dumb to see that a function does
 not accept arguments, use parentheses:
 
      $x = somefunc() ? 1 : 0;
      $y = (somefunc) ? 1 : 0;
 
    In fact the Perl parser itself has similar problems and warns you
 about such constructs.