Documentation
From my personal library, The Internet

man page:


PCRE NATIVE API


       #include <pcre.h>

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_copy_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       int pcre_get_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int *ovector, int stringcount, const char ***listptr);


       char *pcre_version(void);

       void *(*pcre_malloc)(size_t);

       void (*pcre_free)(void *);

       void *(*pcre_stack_malloc)(size_t);

       void (*pcre_stack_free)(void *);

       int (*pcre_callout)(pcre_callout_block *);


PCRE API OVERVIEW


       PCRE has its own native API, which is described in this document. There
       is also a set of wrapper functions that correspond to the POSIX regular
       expression  API.  These  are  described in the pcreposix documentation.
       Both of these APIs define a set of C function calls. A C++  wrapper  is
       distributed with PCRE. It is documented in the pcrecpp page.

       The  native  API  C  function prototypes are defined in the header file
       pcre.h, and on Unix systems the library itself is called  libpcre.   It
       can normally be accessed by adding -lpcre to the command for linking an
       application  that  uses  PCRE.  The  header  file  defines  the  macros
       PCRE_MAJOR  and  PCRE_MINOR to contain the major and minor release num-
       bers for the library.  Applications can use these  to  include  support
       for different releases of PCRE.

       The   functions   pcre_compile(),  pcre_compile2(),  pcre_study(),  and
       pcre_exec() are used for compiling and matching regular expressions  in
       a  Perl-compatible  manner. A sample program that demonstrates the sim-
       plest way of using them is provided in the file  called  pcredemo.c  in
       the  source distribution. The pcresample documentation describes how to
       run it.

       A second matching function, pcre_dfa_exec(), which is not Perl-compati-
       ble,  is  also provided. This uses a different algorithm for the match-
       ing. The alternative algorithm finds all possible matches (at  a  given
       point in the subject). However, this algorithm does not return captured
       substrings. A description of the  two  matching  algorithms  and  their
       advantages  and  disadvantages  is given in the pcrematching documenta-
       tion.

       In addition to the main compiling and  matching  functions,  there  are
       convenience functions for extracting captured substrings from a subject
       string that is matched by pcre_exec(). They are:

         pcre_copy_substring()
         pcre_copy_named_substring()
         pcre_get_substring()
         pcre_get_named_substring()

       The function pcre_fullinfo() is used to find out  information  about  a
       compiled  pattern; pcre_info() is an obsolete version that returns only
       some of the available information, but is retained for  backwards  com-
       patibility.   The function pcre_version() returns a pointer to a string
       containing the version of PCRE and its date of release.

       The function pcre_refcount() maintains a  reference  count  in  a  data
       block  containing  a compiled pattern. This is provided for the benefit
       of object-oriented applications.

       The global variables pcre_malloc and pcre_free  initially  contain  the
       entry  points  of  the  standard malloc() and free() functions, respec-
       tively. PCRE calls the memory management functions via these variables,
       so  a  calling  program  can replace them if it wishes to intercept the
       calls. This should be done before calling any PCRE functions.

       The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
       indirections  to  memory  management functions. These special functions
       are used only when PCRE is compiled to use  the  heap  for  remembering
       data, instead of recursive function calls, when running the pcre_exec()
       function. See the pcrebuild documentation for  details  of  how  to  do
       this.  It  is  a non-standard way of building PCRE, for use in environ-
       ments that have limited stacks. Because of the greater  use  of  memory
       management,  it  runs  more  slowly. Separate functions are provided so
       that special-purpose external code can be  used  for  this  case.  When
       used,  these  functions  are always called in a stack-like manner (last
       obtained, first freed), and always for memory blocks of the same  size.
       There  is  a discussion about PCRE's stack usage in the pcrestack docu-
       mentation.

       The global variable pcre_callout initially contains NULL. It can be set
       by  the  caller  to  a "callout" function, which PCRE will then call at
       specified points during a matching operation. Details are given in  the
       pcrecallout documentation.


NEWLINES

       PCRE supports three different conventions for indicating line breaks in
       strings: a single CR character, a single LF character, or the two-char-
       acter  sequence  CRLF.  All  three  are used as "standard" by different
       operating systems.  When PCRE is built, a default can be specified. The
       default  default  is  LF, which is the Unix standard. When PCRE is run,
       the default can be overridden, either when a pattern  is  compiled,  or
       when it is matched.

       In the PCRE documentation the word "newline" is used to mean "the char-
       acter or pair of characters that indicate a line break".


MULTITHREADING


       The PCRE functions can be used in  multi-threading  applications,  with
       the  proviso  that  the  memory  management  functions  pointed  to  by
       pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the

       int pcre_config(int what, void *where);

       The  function pcre_config() makes it possible for a PCRE client to dis-
       cover which optional features have been compiled into the PCRE library.
       The  pcrebuild documentation has more details about these optional fea-
       tures.

       The first argument for pcre_config() is an  integer,  specifying  which
       information is required; the second argument is a pointer to a variable
       into which the information is  placed.  The  following  information  is
       available:

         PCRE_CONFIG_UTF8

       The  output is an integer that is set to one if UTF-8 support is avail-
       able; otherwise it is set to zero.

         PCRE_CONFIG_UNICODE_PROPERTIES

       The output is an integer that is set to  one  if  support  for  Unicode
       character properties is available; otherwise it is set to zero.

         PCRE_CONFIG_NEWLINE

       The  output  is  an integer whose value specifies the default character
       sequence that is recognized as meaning "newline". The three values that
       are supported are: 10 for LF, 13 for CR, and 3338 for CRLF. The default
       should normally be the standard sequence for your operating system.

         PCRE_CONFIG_LINK_SIZE

       The output is an integer that contains the number  of  bytes  used  for
       internal linkage in compiled regular expressions. The value is 2, 3, or
       4. Larger values allow larger regular expressions to  be  compiled,  at
       the  expense  of  slower matching. The default value of 2 is sufficient
       for all but the most massive patterns, since  it  allows  the  compiled
       pattern to be up to 64K in size.

         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

       The  output  is  an integer that contains the threshold above which the
       POSIX interface uses malloc() for output vectors. Further  details  are
       given in the pcreposix documentation.

         PCRE_CONFIG_MATCH_LIMIT

       The output is an integer that gives the default limit for the number of
       internal matching function calls in a  pcre_exec()  execution.  Further
       details are given with pcre_exec() below.

         PCRE_CONFIG_MATCH_LIMIT_RECURSION


COMPILING A PATTERN


       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       Either of the functions pcre_compile() or pcre_compile2() can be called
       to compile a pattern into an internal form. The only difference between
       the  two interfaces is that pcre_compile2() has an additional argument,
       errorcodeptr, via which a numerical error code can be returned.

       The pattern is a C string terminated by a binary zero, and is passed in
       the  pattern  argument.  A  pointer to a single block of memory that is
       obtained via pcre_malloc is returned. This contains the  compiled  code
       and related data. The pcre type is defined for the returned block; this
       is a typedef for a structure whose contents are not externally defined.
       It is up to the caller to free the memory (via pcre_free) when it is no
       longer required.

       Although the compiled code of a PCRE regex is relocatable, that is,  it
       does not depend on memory location, the complete pcre data block is not
       fully relocatable, because it may contain a copy of the tableptr  argu-
       ment, which is an address (see below).

       The options argument contains independent bits that affect the compila-
       tion. It should be zero if  no  options  are  required.  The  available
       options  are  described  below. Some of them, in particular, those that
       are compatible with Perl, can also be set and  unset  from  within  the
       pattern  (see  the  detailed  description in the pcrepattern documenta-
       tion). For these options, the contents of the options  argument  speci-
       fies  their initial settings at the start of compilation and execution.
       The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at  the  time
       of matching as well as at compile time.

       If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
       if compilation of a pattern fails,  pcre_compile()  returns  NULL,  and
       sets the variable pointed to by errptr to point to a textual error mes-
       sage. This is a static string that is part of the library. You must not
       try to free it. The offset from the start of the pattern to the charac-
       ter where the error was discovered is placed in the variable pointed to
       by  erroffset,  which must not be NULL. If it is, an immediate error is
       given.

       If pcre_compile2() is used instead of pcre_compile(),  and  the  error-
       codeptr  argument is not NULL, a non-zero error code number is returned
       via this argument in the event of an error. This is in addition to  the
         const char *error;
         int erroffset;
         re = pcre_compile(
           "^A.*Z",          /* the pattern */
           0,                /* default options */
           &error,           /* for error message */
           &erroffset,       /* for error offset */
           NULL);            /* use default character tables */

       The following names for option bits are defined in  the  pcre.h  header
       file:

         PCRE_ANCHORED

       If this bit is set, the pattern is forced to be "anchored", that is, it
       is constrained to match only at the first matching point in the  string
       that  is being searched (the "subject string"). This effect can also be
       achieved by appropriate constructs in the pattern itself, which is  the
       only way to do it in Perl.

         PCRE_AUTO_CALLOUT

       If this bit is set, pcre_compile() automatically inserts callout items,
       all with number 255, before each pattern item. For  discussion  of  the
       callout facility, see the pcrecallout documentation.

         PCRE_CASELESS

       If  this  bit is set, letters in the pattern match both upper and lower
       case letters. It is equivalent to Perl's  /i  option,  and  it  can  be
       changed  within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
       always understands the concept of case for characters whose values  are
       less  than 128, so caseless matching is always possible. For characters
       with higher values, the concept of case is supported if  PCRE  is  com-
       piled  with Unicode property support, but not otherwise. If you want to
       use caseless matching for characters 128 and  above,  you  must  ensure
       that  PCRE  is  compiled  with Unicode property support as well as with
       UTF-8 support.

         PCRE_DOLLAR_ENDONLY

       If this bit is set, a dollar metacharacter in the pattern matches  only
       at  the  end  of the subject string. Without this option, a dollar also
       matches immediately before a newline at the end of the string (but  not
       before  any  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored
       if PCRE_MULTILINE is set.  There is no equivalent  to  this  option  in
       Perl, and no way to set it within a pattern.

         PCRE_DOTALL

       If this bit is set, a dot metacharater in the pattern matches all char-
       acters, including those that indicate newline. Without it, a  dot  does

       If this bit is set, whitespace  data  characters  in  the  pattern  are
       totally ignored except when escaped or inside a character class. White-
       space does not include the VT character (code 11). In addition, charac-
       ters between an unescaped # outside a character class and the next new-
       line, inclusive, are also ignored. This  is  equivalent  to  Perl's  /x
       option,  and  it  can be changed within a pattern by a (?x) option set-
       ting.

       This option makes it possible to include  comments  inside  complicated
       patterns.   Note,  however,  that this applies only to data characters.
       Whitespace  characters  may  never  appear  within  special   character
       sequences  in  a  pattern,  for  example  within the sequence (?( which
       introduces a conditional subpattern.

         PCRE_EXTRA

       This option was invented in order to turn on  additional  functionality
       of  PCRE  that  is  incompatible with Perl, but it is currently of very
       little use. When set, any backslash in a pattern that is followed by  a
       letter  that  has  no  special  meaning causes an error, thus reserving
       these combinations for future expansion. By  default,  as  in  Perl,  a
       backslash  followed by a letter with no special meaning is treated as a
       literal. (Perl can, however, be persuaded to give a warning for  this.)
       There  are  at  present no other features controlled by this option. It
       can also be set by a (?X) option setting within a pattern.

         PCRE_FIRSTLINE

       If this option is set, an  unanchored  pattern  is  required  to  match
       before  or  at  the  first  newline  in  the subject string, though the
       matched text may continue over the newline.

         PCRE_MULTILINE

       By default, PCRE treats the subject string as consisting  of  a  single
       line  of characters (even if it actually contains newlines). The "start
       of line" metacharacter (^) matches only at the  start  of  the  string,
       while  the  "end  of line" metacharacter ($) matches only at the end of
       the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
       is set). This is the same as Perl.

       When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
       constructs match immediately following or immediately  before  internal
       newlines  in  the  subject string, respectively, as well as at the very
       start and end. This is equivalent to Perl's /m option, and  it  can  be
       changed within a pattern by a (?m) option setting. If there are no new-
       lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
       setting PCRE_MULTILINE has no effect.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF

         PCRE_NO_AUTO_CAPTURE

       If this option is set, it disables the use of numbered capturing paren-
       theses  in the pattern. Any opening parenthesis that is not followed by
       ? behaves as if it were followed by ?: but named parentheses can  still
       be  used  for  capturing  (and  they acquire numbers in the usual way).
       There is no equivalent of this option in Perl.

         PCRE_UNGREEDY

       This option inverts the "greediness" of the quantifiers  so  that  they
       are  not greedy by default, but become greedy if followed by "?". It is
       not compatible with Perl. It can also be set by a (?U)  option  setting
       within the pattern.

         PCRE_UTF8

       This  option  causes PCRE to regard both the pattern and the subject as
       strings of UTF-8 characters instead of single-byte  character  strings.
       However,  it is available only when PCRE is built to include UTF-8 sup-
       port. If not, the use of this option provokes an error. Details of  how
       this  option  changes the behaviour of PCRE are given in the section on
       UTF-8 support in the main pcre page.

         PCRE_NO_UTF8_CHECK

       When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
       automatically  checked. If an invalid UTF-8 sequence of bytes is found,
       pcre_compile() returns an error. If you already know that your  pattern
       is  valid, and you want to skip this check for performance reasons, you
       can set the PCRE_NO_UTF8_CHECK option. When it is set,  the  effect  of
       passing an invalid UTF-8 string as a pattern is undefined. It may cause
       your program to crash.  Note that this option can  also  be  passed  to
       pcre_exec()  and pcre_dfa_exec(), to suppress the UTF-8 validity check-
       ing of subject strings.


COMPILATION ERROR CODES


       The following table lists the error  codes  than  may  be  returned  by
       pcre_compile2(),  along with the error messages that may be returned by
       both compiling functions.

          0  no error
          1  \ at end of pattern
          2  \c at end of pattern
          3  unrecognized character follows \
          4  numbers out of order in {} quantifier
          5  number too big in {} quantifier
          6  missing terminating ] for character class
          7  invalid escape sequence in character class
          8  range out of order in character class

         23  internal error: code overflow
         24  unrecognized character after (?<
         25  lookbehind assertion is not fixed length
         26  malformed number or name after (?(
         27  conditional group contains more than two branches
         28  assertion expected after (?(
         29  (?R or (?digits must be followed by )
         30  unknown POSIX class name
         31  POSIX collating elements are not supported
         32  this version of PCRE is not compiled with PCRE_UTF8 support
         33  spare error
         34  character value in \x{...} sequence is too large
         35  invalid condition (?(0)
         36  \C not allowed in lookbehind assertion
         37  PCRE does not support \L, \l, \N, \U, or \u
         38  number after (?C is > 255
         39  closing ) for (?C expected
         40  recursive call could loop indefinitely
         41  unrecognized character after (?P
         42  syntax error after (?P
         43  two named subpatterns have the same name
         44  invalid UTF-8 string
         45  support for \P, \p, and \X has not been compiled
         46  malformed \P or \p sequence
         47  unknown property name after \P or \p
         48  subpattern name is too long (maximum 32 characters)
         49  too many named subpatterns (maximum 10,000)
         50  repeated subpattern is too long
         51  octal value is greater than \377 (not in UTF-8 mode)


STUDYING A PATTERN


       pcre_extra *pcre_study(const pcre *code, int options
            const char **errptr);

       If a compiled pattern is going to be used several times,  it  is  worth
       spending more time analyzing it in order to speed up the time taken for
       matching. The function pcre_study() takes a pointer to a compiled  pat-
       tern as its first argument. If studying the pattern produces additional
       information that will help speed up matching,  pcre_study()  returns  a
       pointer  to a pcre_extra block, in which the study_data field points to
       the results of the study.

       The  returned  value  from  pcre_study()  can  be  passed  directly  to
       pcre_exec().  However,  a  pcre_extra  block also contains other fields
       that can be set by the caller before the block  is  passed;  these  are
       described below in the section on matching a pattern.

       If  studying  the  pattern  does not produce any additional information
       pcre_study() returns NULL. In that circumstance, if the calling program
       wants  to  pass  any of the other fields to pcre_exec(), it must set up
       its own pcre_extra block.
         pe = pcre_study(
           re,             /* result of pcre_compile() */
           0,              /* no options exist */
           &error);        /* set to NULL or points to a message */

       At present, studying a pattern is useful only for non-anchored patterns
       that  do not have a single fixed starting character. A bitmap of possi-
       ble starting bytes is created.


LOCALE SUPPORT


       PCRE handles caseless matching, and determines whether  characters  are
       letters  digits,  or whatever, by reference to a set of tables, indexed
       by character value. When running in UTF-8 mode, this  applies  only  to
       characters  with  codes  less than 128. Higher-valued codes never match
       escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
       with  Unicode  character property support. The use of locales with Uni-
       code is discouraged.

       An internal set of tables is created in the default C locale when  PCRE
       is  built.  This  is  used when the final argument of pcre_compile() is
       NULL, and is sufficient for many applications. An  alternative  set  of
       tables  can,  however, be supplied. These may be created in a different
       locale from the default. As more and more applications change to  using
       Unicode, the need for this locale support is expected to die away.

       External  tables  are  built by calling the pcre_maketables() function,
       which has no arguments, in the relevant locale. The result can then  be
       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
       example, to build and use tables that are appropriate  for  the  French
       locale  (where  accented  characters  with  values greater than 128 are
       treated as letters), the following code could be used:

         setlocale(LC_CTYPE, "fr_FR");
         tables = pcre_maketables();
         re = pcre_compile(..., tables);

       When pcre_maketables() runs, the tables are built  in  memory  that  is
       obtained  via  pcre_malloc. It is the caller's responsibility to ensure
       that the memory containing the tables remains available for as long  as
       it is needed.

       The pointer that is passed to pcre_compile() is saved with the compiled
       pattern, and the same tables are used via this pointer by  pcre_study()
       and normally also by pcre_exec(). Thus, by default, for any single pat-
       tern, compilation, studying and matching all happen in the same locale,
       but different patterns can be compiled in different locales.

       It  is  possible to pass a table pointer or NULL (indicating the use of
       the internal tables) to pcre_exec(). Although  not  intended  for  this
       purpose,  this facility could be used to match a pattern in a different
       locale from the one in which it was compiled. Passing table pointers at
       of information is required, and the fourth argument is a pointer  to  a
       variable  to  receive  the  data. The yield of the function is zero for
       success, or one of the following negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
                               the argument where was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found
         PCRE_ERROR_BADOPTION  the value of what was invalid

       The "magic number" is placed at the start of each compiled  pattern  as
       an  simple check against passing an arbitrary memory pointer. Here is a
       typical call of pcre_fullinfo(), to obtain the length of  the  compiled
       pattern:

         int rc;
         size_t length;
         rc = pcre_fullinfo(
           re,               /* result of pcre_compile() */
           pe,               /* result of pcre_study(), or NULL */
           PCRE_INFO_SIZE,   /* what is required */
           &length);         /* where to put the data */

       The  possible  values for the third argument are defined in pcre.h, and
       are as follows:

         PCRE_INFO_BACKREFMAX

       Return the number of the highest back reference  in  the  pattern.  The
       fourth  argument  should  point to an int variable. Zero is returned if
       there are no back references.

         PCRE_INFO_CAPTURECOUNT

       Return the number of capturing subpatterns in the pattern.  The  fourth
       argument should point to an int variable.

         PCRE_INFO_DEFAULT_TABLES

       Return  a pointer to the internal default character tables within PCRE.
       The fourth argument should point to an unsigned char *  variable.  This
       information call is provided for internal use by the pcre_study() func-
       tion. External callers can cause PCRE to use  its  internal  tables  by
       passing a NULL table pointer.

         PCRE_INFO_FIRSTBYTE

       Return  information  about  the first byte of any matched string, for a
       non-anchored pattern. The fourth argument should point to an int  vari-
       able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
       is still recognized for backwards compatibility.)

       If there is a fixed first byte, for example, from  a  pattern  such  as
       If  the pattern was studied, and this resulted in the construction of a
       256-bit table indicating a fixed set of bytes for the first byte in any
       matching  string, a pointer to the table is returned. Otherwise NULL is
       returned. The fourth argument should point to an unsigned char *  vari-
       able.

         PCRE_INFO_LASTLITERAL

       Return  the  value of the rightmost literal byte that must exist in any
       matched string, other than at its  start,  if  such  a  byte  has  been
       recorded. The fourth argument should point to an int variable. If there
       is no such byte, -1 is returned. For anchored patterns, a last  literal
       byte  is  recorded only if it follows something of variable length. For
       example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
       /^a\dz\d/ the returned value is -1.

         PCRE_INFO_NAMECOUNT
         PCRE_INFO_NAMEENTRYSIZE
         PCRE_INFO_NAMETABLE

       PCRE  supports the use of named as well as numbered capturing parenthe-
       ses. The names are just an additional way of identifying the  parenthe-
       ses, which still acquire numbers. Several convenience functions such as
       pcre_get_named_substring() are provided for  extracting  captured  sub-
       strings  by  name. It is also possible to extract the data directly, by
       first converting the name to a number in order to  access  the  correct
       pointers in the output vector (described with pcre_exec() below). To do
       the conversion, you need  to  use  the  name-to-number  map,  which  is
       described by these three values.

       The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
       gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
       of  each  entry;  both  of  these  return  an int value. The entry size
       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
       a  pointer  to  the  first  entry of the table (a pointer to char). The
       first two bytes of each entry are the number of the capturing parenthe-
       sis,  most  significant byte first. The rest of the entry is the corre-
       sponding name, zero terminated. The names are  in  alphabetical  order.
       When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
       theses numbers. For example, consider  the  following  pattern  (assume
       PCRE_EXTENDED  is  set,  so  white  space  -  including  newlines  - is
       ignored):

         (?P<date> (?P<year>(\d\d)?\d\d) -
         (?P<month>\d\d) - (?P<day>\d\d) )

       There are four named subpatterns, so the table has  four  entries,  and
       each  entry  in the table is eight bytes long. The table is as follows,
       with non-printing bytes shows in hexadecimal, and undefined bytes shown
       as ??:

         00 01 d  a  t  e  00 ??

       A pattern is automatically anchored by PCRE if  all  of  its  top-level
       alternatives begin with one of the following:

         ^     unless PCRE_MULTILINE is set
         \A    always
         \G    always
         .*    if PCRE_DOTALL is set and there are no back
                 references to the subpattern in which .* appears

       For such patterns, the PCRE_ANCHORED bit is set in the options returned
       by pcre_fullinfo().

         PCRE_INFO_SIZE

       Return the size of the compiled pattern, that is, the  value  that  was
       passed as the argument to pcre_malloc() when PCRE was getting memory in
       which to place the compiled data. The fourth argument should point to a
       size_t variable.

         PCRE_INFO_STUDYSIZE

       Return the size of the data block pointed to by the study_data field in
       a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
       pcre_malloc() when PCRE was getting memory into which to place the data
       created by pcre_study(). The fourth argument should point to  a  size_t
       variable.


OBSOLETE INFO FUNCTION


       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);

       The  pcre_info()  function is now obsolete because its interface is too
       restrictive to return all the available data about a compiled  pattern.
       New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
       pcre_info() is the number of capturing subpatterns, or one of the  fol-
       lowing negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found

       If  the  optptr  argument is not NULL, a copy of the options with which
       the pattern was compiled is placed in the integer  it  points  to  (see
       PCRE_INFO_OPTIONS above).

       If  the  pattern  is  not anchored and the firstcharptr argument is not
       NULL, it is used to pass back information about the first character  of
       any matched string (see PCRE_INFO_FIRSTBYTE above).


REFERENCE COUNTS


       int pcre_refcount(pcre *code, int adjust);
       Except when it is zero, the reference count is not correctly  preserved
       if  a  pattern  is  compiled on one host and then transferred to a host
       whose byte-order is different. (This seems a highly unlikely scenario.)


MATCHING A PATTERN: THE TRADITIONAL FUNCTION


       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       The  function pcre_exec() is called to match a subject string against a
       compiled pattern, which is passed in the code argument. If the  pattern
       has been studied, the result of the study should be passed in the extra
       argument. This function is the main matching facility of  the  library,
       and it operates in a Perl-like manner. For specialist use there is also
       an alternative matching function, which is described below in the  sec-
       tion about the pcre_dfa_exec() function.

       In  most applications, the pattern will have been compiled (and option-
       ally studied) in the same process that calls pcre_exec().  However,  it
       is possible to save compiled patterns and study data, and then use them
       later in different processes, possibly even on different hosts.  For  a
       discussion about this, see the pcreprecompile documentation.

       Here is an example of a simple call to pcre_exec():

         int rc;
         int ovector[30];
         rc = pcre_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,             /* the length of the subject string */
           0,              /* start at offset 0 in the subject */
           0,              /* default options */
           ovector,        /* vector of integers for substring information */
           30);            /* number of elements (NOT size in bytes) */

   Extra data for pcre_exec()

       If  the  extra argument is not NULL, it must point to a pcre_extra data
       block. The pcre_study() function returns such a block (when it  doesn't
       return  NULL), but you can also create one for yourself, and pass addi-
       tional information in it. The pcre_extra block contains  the  following
       fields (not necessarily in this order):

         unsigned long int flags;
         void *study_data;
         unsigned long int match_limit;
         unsigned long int match_limit_recursion;
         void *callout_data;
         const unsigned char *tables;

       flag bits.

       The match_limit field provides a means of preventing PCRE from using up
       a  vast amount of resources when running patterns that are not going to
       match, but which have a very large number  of  possibilities  in  their
       search  trees.  The  classic  example  is  the  use of nested unlimited
       repeats.

       Internally, PCRE uses a function called match() which it calls  repeat-
       edly  (sometimes  recursively). The limit set by match_limit is imposed
       on the number of times this function is called during  a  match,  which
       has  the  effect  of  limiting the amount of backtracking that can take
       place. For patterns that are not anchored, the count restarts from zero
       for each position in the subject string.

       The  default  value  for  the  limit can be set when PCRE is built; the
       default default is 10 million, which handles all but the  most  extreme
       cases.  You  can  override  the  default by suppling pcre_exec() with a
       pcre_extra    block    in    which    match_limit    is    set,     and
       PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
       exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.

       The match_limit_recursion field is similar to match_limit, but  instead
       of limiting the total number of times that match() is called, it limits
       the depth of recursion. The recursion depth is a  smaller  number  than
       the  total number of calls, because not all calls to match() are recur-
       sive.  This limit is of use only if it is set smaller than match_limit.

       Limiting  the  recursion  depth  limits the amount of stack that can be
       used, or, when PCRE has been compiled to use memory on the heap instead
       of the stack, the amount of heap memory that can be used.

       The  default  value  for  match_limit_recursion can be set when PCRE is
       built; the default default  is  the  same  value  as  the  default  for
       match_limit.  You can override the default by suppling pcre_exec() with
       a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
       PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
       limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.

       The pcre_callout field is used in conjunction with the  "callout"  fea-
       ture, which is described in the pcrecallout documentation.

       The  tables  field  is  used  to  pass  a  character  tables pointer to
       pcre_exec(); this overrides the value that is stored with the  compiled
       pattern.  A  non-NULL value is stored with the compiled pattern only if
       custom tables were supplied to pcre_compile() via  its  tableptr  argu-
       ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
       PCRE's internal tables to be used. This facility is  helpful  when  re-
       using  patterns  that  have been saved after compiling with an external
       set of tables, because the external tables  might  be  at  a  different
       address  when  pcre_exec() is called. See the pcreprecompile documenta-
       tion for a discussion of saving compiled patterns for later use.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF

       These options override  the  newline  definition  that  was  chosen  or
       defaulted  when the pattern was compiled. For details, see the descrip-
       tion pcre_compile() above. During matching, the newline choice  affects
       the behaviour of the dot, circumflex, and dollar metacharacters.

         PCRE_NOTBOL

       This option specifies that first character of the subject string is not
       the beginning of a line, so the  circumflex  metacharacter  should  not
       match  before it. Setting this without PCRE_MULTILINE (at compile time)
       causes circumflex never to match. This option affects only  the  behav-
       iour of the circumflex metacharacter. It does not affect \A.

         PCRE_NOTEOL

       This option specifies that the end of the subject string is not the end
       of a line, so the dollar metacharacter should not match it nor  (except
       in  multiline mode) a newline immediately before it. Setting this with-
       out PCRE_MULTILINE (at compile time) causes dollar never to match. This
       option  affects only the behaviour of the dollar metacharacter. It does
       not affect \Z or \z.

         PCRE_NOTEMPTY

       An empty string is not considered to be a valid match if this option is
       set.  If  there are alternatives in the pattern, they are tried. If all
       the alternatives match the empty string, the entire  match  fails.  For
       example, if the pattern

         a?b?

       is  applied  to  a string not beginning with "a" or "b", it matches the
       empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
       match is not valid, so PCRE searches further into the string for occur-
       rences of "a" or "b".

       Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
       cial  case  of  a  pattern match of the empty string within its split()
       function, and when using the /g modifier. It  is  possible  to  emulate
       Perl's behaviour after matching a null string by first trying the match
       again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
       if  that  fails by advancing the starting offset (see below) and trying
       an ordinary match again. There is some code that demonstrates how to do
       this in the pcredemo.c sample program.

         PCRE_NO_UTF8_CHECK

       points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
       set, the effect of passing an invalid UTF-8 string as a subject,  or  a
       value  of startoffset that does not point to the start of a UTF-8 char-
       acter, is undefined. Your program may crash.

         PCRE_PARTIAL

       This option turns on the  partial  matching  feature.  If  the  subject
       string  fails to match the pattern, but at some point during the match-
       ing process the end of the subject was reached (that  is,  the  subject
       partially  matches  the  pattern and the failure to match occurred only
       because there were not enough subject characters), pcre_exec()  returns
       PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
       used, there are restrictions on what may appear in the  pattern.  These
       are discussed in the pcrepartial documentation.

   The string to be matched by pcre_exec()

       The  subject string is passed to pcre_exec() as a pointer in subject, a
       length in length, and a starting byte offset in startoffset.  In  UTF-8
       mode,  the  byte  offset  must point to the start of a UTF-8 character.
       Unlike the pattern string, the subject may contain binary  zero  bytes.
       When  the starting offset is zero, the search for a match starts at the
       beginning of the subject, and this is by far the most common case.

       A non-zero starting offset is useful when searching for  another  match
       in  the same subject by calling pcre_exec() again after a previous suc-
       cess.  Setting startoffset differs from just passing over  a  shortened
       string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
       with any kind of lookbehind. For example, consider the pattern

         \Biss\B

       which finds occurrences of "iss" in the middle of  words.  (\B  matches
       only  if  the  current position in the subject is not a word boundary.)
       When applied to the string "Mississipi" the first call  to  pcre_exec()
       finds  the  first  occurrence. If pcre_exec() is called again with just
       the remainder of the subject,  namely  "issipi",  it  does  not  match,
       because \B is always false at the start of the subject, which is deemed
       to be a word boundary. However, if pcre_exec()  is  passed  the  entire
       string again, but with startoffset set to 4, it finds the second occur-
       rence of "iss" because it is able to look behind the starting point  to
       discover that it is preceded by a letter.

       If  a  non-zero starting offset is passed when the pattern is anchored,
       one attempt to match at the given offset is made. This can only succeed
       if  the  pattern  does  not require the match to be at the start of the
       subject.

   How pcre_exec() returns captured substrings

       In general, a pattern matches a certain portion of the subject, and  in
       of  the  vector is used as workspace by pcre_exec() while matching cap-
       turing subpatterns, and is not available for passing back  information.
       The  length passed in ovecsize should always be a multiple of three. If
       it is not, it is rounded down.

       When a match is successful, information about  captured  substrings  is
       returned  in  pairs  of integers, starting at the beginning of ovector,
       and continuing up to two-thirds of its length at the  most.  The  first
       element of a pair is set to the offset of the first character in a sub-
       string, and the second is set to the  offset  of  the  first  character
       after  the  end  of  a  substring. The first pair, ovector[0] and ovec-
       tor[1], identify the portion of  the  subject  string  matched  by  the
       entire  pattern.  The next pair is used for the first capturing subpat-
       tern, and so on. The value returned by pcre_exec() is one more than the
       highest numbered pair that has been set. For example, if two substrings
       have been captured, the returned value is 3. If there are no  capturing
       subpatterns,  the return value from a successful match is 1, indicating
       that just the first pair of offsets has been set.

       If a capturing subpattern is matched repeatedly, it is the last portion
       of the string that it matched that is returned.

       If  the vector is too small to hold all the captured substring offsets,
       it is used as far as possible (up to two-thirds of its length), and the
       function  returns a value of zero. In particular, if the substring off-
       sets are not of interest, pcre_exec() may be called with ovector passed
       as  NULL  and  ovecsize  as zero. However, if the pattern contains back
       references and the ovector is not big enough to  remember  the  related
       substrings,  PCRE has to get additional memory for use during matching.
       Thus it is usually advisable to supply an ovector.

       The pcre_info() function can be used to find  out  how  many  capturing
       subpatterns  there  are  in  a  compiled pattern. The smallest size for
       ovector that will allow for n captured substrings, in addition  to  the
       offsets of the substring matched by the whole pattern, is (n+1)*3.

       It  is  possible for capturing subpattern number n+1 to match some part
       of the subject when subpattern n has not been used at all. For example,
       if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
       return from the function is 4, and subpatterns 1 and 3 are matched, but
       2  is  not.  When  this happens, both values in the offset pairs corre-
       sponding to unused subpatterns are set to -1.

       Offset values that correspond to unused subpatterns at the end  of  the
       expression  are  also  set  to  -1. For example, if the string "abc" is
       matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
       matched.  The  return  from the function is 2, because the highest used
       capturing subpattern number is 1. However, you can refer to the offsets
       for  the  second  and third capturing subpatterns if you wish (assuming
       the vector is large enough, of course).

       Some convenience functions are provided  for  extracting  the  captured
       ovecsize was not zero.

         PCRE_ERROR_BADOPTION      (-3)

       An unrecognized bit was set in the options argument.

         PCRE_ERROR_BADMAGIC       (-4)

       PCRE  stores a 4-byte "magic number" at the start of the compiled code,
       to catch the case when it is passed a junk pointer and to detect when a
       pattern that was compiled in an environment of one endianness is run in
       an environment with the other endianness. This is the error  that  PCRE
       gives when the magic number is not present.

         PCRE_ERROR_UNKNOWN_NODE   (-5)

       While running the pattern match, an unknown item was encountered in the
       compiled pattern. This error could be caused by a bug  in  PCRE  or  by
       overwriting of the compiled pattern.

         PCRE_ERROR_NOMEMORY       (-6)

       If  a  pattern contains back references, but the ovector that is passed
       to pcre_exec() is not big enough to remember the referenced substrings,
       PCRE  gets  a  block of memory at the start of matching to use for this
       purpose. If the call via pcre_malloc() fails, this error is given.  The
       memory is automatically freed at the end of matching.

         PCRE_ERROR_NOSUBSTRING    (-7)

       This  error is used by the pcre_copy_substring(), pcre_get_substring(),
       and  pcre_get_substring_list()  functions  (see  below).  It  is  never
       returned by pcre_exec().

         PCRE_ERROR_MATCHLIMIT     (-8)

       The  backtracking  limit,  as  specified  by the match_limit field in a
       pcre_extra structure (or defaulted) was reached.  See  the  description
       above.

         PCRE_ERROR_RECURSIONLIMIT (-21)

       The internal recursion limit, as specified by the match_limit_recursion
       field in a pcre_extra structure (or defaulted)  was  reached.  See  the
       description above.

         PCRE_ERROR_CALLOUT        (-9)

       This error is never generated by pcre_exec() itself. It is provided for
       use by callout functions that want to yield a distinctive  error  code.
       See the pcrecallout documentation for details.

       pcrepartial documentation for details of partial matching.

         PCRE_ERROR_BADPARTIAL     (-13)

       The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
       items that are not supported for partial matching. See the  pcrepartial
       documentation for details of partial matching.

         PCRE_ERROR_INTERNAL       (-14)

       An  unexpected  internal error has occurred. This error could be caused
       by a bug in PCRE or by overwriting of the compiled pattern.

         PCRE_ERROR_BADCOUNT       (-15)

       This error is given if the value of the ovecsize argument is  negative.


EXTRACTING CAPTURED SUBSTRINGS BY NUMBER


       int pcre_copy_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int *ovector, int stringcount, const char ***listptr);

       Captured  substrings  can  be  accessed  directly  by using the offsets
       returned by pcre_exec() in  ovector.  For  convenience,  the  functions
       pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
       string_list() are provided for extracting captured substrings  as  new,
       separate,  zero-terminated strings. These functions identify substrings
       by number. The next section describes functions  for  extracting  named
       substrings.

       A  substring that contains a binary zero is correctly extracted and has
       a further zero added on the end, but the result is not, of course, a  C
       string.   However,  you  can  process such a string by referring to the
       length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-
       string().  Unfortunately, the interface to pcre_get_substring_list() is
       not adequate for handling strings containing binary zeros, because  the
       end of the final string is not independently indicated.

       The  first  three  arguments  are the same for all three of these func-
       tions: subject is the subject string that has  just  been  successfully
       matched, ovector is a pointer to the vector of integer offsets that was
       passed to pcre_exec(), and stringcount is the number of substrings that
       were  captured  by  the match, including the substring that matched the
       entire regular expression. This is the value returned by pcre_exec() if
         PCRE_ERROR_NOMEMORY       (-6)

       The  buffer  was too small for pcre_copy_substring(), or the attempt to
       get memory failed for pcre_get_substring().

         PCRE_ERROR_NOSUBSTRING    (-7)

       There is no substring whose number is stringnumber.

       The pcre_get_substring_list()  function  extracts  all  available  sub-
       strings  and  builds  a list of pointers to them. All this is done in a
       single block of memory that is obtained via pcre_malloc. The address of
       the  memory  block  is returned via listptr, which is also the start of
       the list of string pointers. The end of the list is marked  by  a  NULL
       pointer. The yield of the function is zero if all went well, or

         PCRE_ERROR_NOMEMORY       (-6)

       if the attempt to get the memory block failed.

       When  any of these functions encounter a substring that is unset, which
       can happen when capturing subpattern number n+1 matches  some  part  of
       the  subject, but subpattern n has not been used at all, they return an
       empty string. This can be distinguished from a genuine zero-length sub-
       string  by inspecting the appropriate offset in ovector, which is nega-
       tive for unset substrings.

       The two convenience functions pcre_free_substring() and  pcre_free_sub-
       string_list()  can  be  used  to free the memory returned by a previous
       call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
       tively.  They  do  nothing  more  than  call the function pointed to by
       pcre_free, which of course could be called directly from a  C  program.
       However,  PCRE is used in some situations where it is linked via a spe-
       cial  interface  to  another  programming  language  that  cannot   use
       pcre_free  directly;  it is for these cases that the functions are pro-
       vided.


EXTRACTING CAPTURED SUBSTRINGS BY NAME


       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       are also two functions that do the whole job.

       Most   of   the   arguments    of    pcre_copy_named_substring()    and
       pcre_get_named_substring()  are  the  same  as  those for the similarly
       named functions that extract by number. As these are described  in  the
       previous  section,  they  are not re-described here. There are just two
       differences:

       First, instead of a substring number, a substring name is  given.  Sec-
       ond, there is an extra argument, given at the start, which is a pointer
       to the compiled pattern. This is needed in order to gain access to  the
       name-to-number translation table.

       These  functions call pcre_get_stringnumber(), and if it succeeds, they
       then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
       ate.


DUPLICATE SUBPATTERN NAMES


       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
       subpatterns are not required to  be  unique.  Normally,  patterns  with
       duplicate  names  are such that in any one match, only one of the named
       subpatterns participates. An example is shown in the pcrepattern  docu-
       mentation. When duplicates are present, pcre_copy_named_substring() and
       pcre_get_named_substring() return the first substring corresponding  to
       the  given  name  that  is  set.  If  none  are set, an empty string is
       returned.  The pcre_get_stringnumber() function returns one of the num-
       bers  that are associated with the name, but it is not defined which it
       is.

       If you want to get full details of all captured substrings for a  given
       name,  you  must  use  the pcre_get_stringtable_entries() function. The
       first argument is the compiled pattern, and the second is the name. The
       third  and  fourth  are  pointers to variables which are updated by the
       function. After it has run, they point to the first and last entries in
       the  name-to-number  table  for  the  given  name.  The function itself
       returns the length of each entry, or  PCRE_ERROR_NOSUBSTRING  if  there
       are  none.  The  format  of the table is described above in the section
       entitled Information about a pattern. Given all  the  relevant  entries
       for the name, you can extract each of their numbers, and hence the cap-
       tured data, if any.


FINDING ALL POSSIBLE MATCHES


       The traditional matching function uses a  similar  algorithm  to  Perl,
       which stops when it finds the first match, starting at a given point in
       the subject. If you want to find all possible matches, or  the  longest
       possible  match,  consider using the alternative matching function (see
       below) instead. If you cannot use the alternative function,  but  still
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       The  function  pcre_dfa_exec()  is  called  to  match  a subject string
       against a compiled pattern, using a "DFA" matching algorithm. This  has
       different  characteristics to the normal algorithm, and is not compati-
       ble with Perl. Some of the features of PCRE patterns are not supported.
       Nevertheless, there are times when this kind of matching can be useful.
       For a discussion of the two matching algorithms, see  the  pcrematching
       documentation.

       The  arguments  for  the  pcre_dfa_exec()  function are the same as for
       pcre_exec(), plus two extras. The ovector argument is used in a differ-
       ent  way,  and  this is described below. The other common arguments are
       used in the same way as for pcre_exec(), so their  description  is  not
       repeated here.

       The  two  additional  arguments provide workspace for the function. The
       workspace vector should contain at least 20 elements. It  is  used  for
       keeping  track  of  multiple  paths  through  the  pattern  tree.  More
       workspace will be needed for patterns and subjects where  there  are  a
       lot of potential matches.

       Here is an example of a simple call to pcre_dfa_exec():

         int rc;
         int ovector[10];
         int wspace[20];
         rc = pcre_dfa_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,             /* the length of the subject string */
           0,              /* start at offset 0 in the subject */
           0,              /* default options */
           ovector,        /* vector of integers for substring information */
           10,             /* number of elements (NOT size in bytes) */
           wspace,         /* working space vector */
           20);            /* number of elements (NOT size in bytes) */

   Option bits for pcre_dfa_exec()

       The  unused  bits  of  the options argument for pcre_dfa_exec() must be
       zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
       LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,
       PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
       three of these are the same as for pcre_exec(), so their description is
       not repeated here.

         PCRE_PARTIAL

       This has the same general effect as it does for  pcre_exec(),  but  the
         PCRE_DFA_RESTART

       When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and
       returns a partial match, it is possible to call it  again,  with  addi-
       tional  subject  characters,  and have it continue with the same match.
       The PCRE_DFA_RESTART option requests this action; when it is  set,  the
       workspace  and wscount options must reference the same vector as before
       because data about the match so far is left in  them  after  a  partial
       match.  There  is  more  discussion of this facility in the pcrepartial
       documentation.

   Successful returns from pcre_dfa_exec()

       When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
       string in the subject. Note, however, that all the matches from one run
       of the function start at the same point in  the  subject.  The  shorter
       matches  are all initial substrings of the longer matches. For example,
       if the pattern

         <.*>

       is matched against the string

         This is <something> <something else> <something further> no more

       the three matched strings are

         <something>
         <something> <something else>
         <something> <something else> <something further>

       On success, the yield of the function is a number  greater  than  zero,
       which  is  the  number of matched substrings. The substrings themselves
       are returned in ovector. Each string uses two elements;  the  first  is
       the  offset  to the start, and the second is the offset to the end. All
       the strings have the same start offset. (Space could have been saved by
       giving  this only once, but it was decided to retain some compatibility
       with the way pcre_exec() returns data, even though the meaning  of  the
       strings is different.)

       The strings are returned in reverse order of length; that is, the long-
       est matching string is given first. If there were too many  matches  to
       fit  into ovector, the yield of the function is zero, and the vector is
       filled with the longest matches.

   Error returns from pcre_dfa_exec()

       The pcre_dfa_exec() function returns a negative number when  it  fails.
       Many  of  the  errors  are  the  same as for pcre_exec(), and these are
       described above.  There are in addition the following errors  that  are
       specific to pcre_dfa_exec():

       This  return  is given if pcre_dfa_exec() is called with an extra block
       that contains a setting of the match_limit field. This is not supported
       (it is meaningless).

         PCRE_ERROR_DFA_WSSIZE     (-19)

       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
       workspace vector.

         PCRE_ERROR_DFA_RECURSE    (-20)

       When a recursive subpattern is processed, the matching  function  calls
       itself  recursively,  using  private vectors for ovector and workspace.
       This error is given if the output vector  is  not  large  enough.  This
       should be extremely rare, as a vector of size 1000 is used.

Last updated: 08 June 2006
Copyright (c) 1997-2006 University of Cambridge.



                                                                    PCREAPI(3)

Man(1) output converted with man2html