aboutsummaryrefslogtreecommitdiff
path: root/v_windows/v/old/vlib/regex/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'v_windows/v/old/vlib/regex/README.md')
-rw-r--r--v_windows/v/old/vlib/regex/README.md874
1 files changed, 874 insertions, 0 deletions
diff --git a/v_windows/v/old/vlib/regex/README.md b/v_windows/v/old/vlib/regex/README.md
new file mode 100644
index 0000000..ae8c41a
--- /dev/null
+++ b/v_windows/v/old/vlib/regex/README.md
@@ -0,0 +1,874 @@
+# V RegEx (Regular expression) 1.0 alpha
+
+[TOC]
+
+## Introduction
+
+Here are the assumptions made during the writing of the implementation, that
+are valid for all the `regex` module features:
+
+1. The matching stops at the end of the string, *not* at newline characters.
+
+2. The basic atomic elements of this regex engine are the tokens.
+In a query string a simple character is a token.
+
+
+## Differences with PCRE:
+
+NB: We must point out that the **V-Regex module is not PCRE compliant** and thus
+some behaviour will be different. This difference is due to the V philosophy,
+to have one way and keep it simple.
+
+The main differences can be summarized in the following points:
+
+- The basic element **is the token not the sequence of symbols**, and the most
+simple token, is a single character.
+
+- `|` **the OR operator acts on tokens,** for example `abc|ebc` is not
+`abc` OR `ebc`. Instead it is evaluated like `ab`, followed by `c OR e`,
+followed by `bc`, because the **token is the base element**,
+not the sequence of symbols.
+
+- The **match operation stops at the end of the string**. It does *NOT* stop
+at new line characters.
+
+
+## Tokens
+
+The tokens are the atomic units, used by this regex engine.
+They can be one of the following:
+
+
+### Simple char
+
+This token is a simple single character like `a` or `b` etc.
+
+
+### Match positional delimiters
+
+`^` Matches the start of the string.
+
+`$` Matches the end of the string.
+
+
+### Char class (cc)
+
+The character classes match all the chars specified inside. Use square
+brackets `[ ]` to enclose them.
+
+The sequence of the chars in the character class, is evaluated with an OR op.
+
+For example, the cc `[abc]`, matches any character, that is `a` or `b` or `c`,
+but it doesn't match `C` or `z`.
+
+Inside a cc, it is possible to specify a "range" of characters, for example
+`[ad-h]` is equivalent to writing `[adefgh]`.
+
+A cc can have different ranges at the same time, for example `[a-zA-z0-9]`
+matches all the latin lowercase, uppercase and numeric characters.
+
+It is possible to negate the meaning of a cc, using the caret char at the
+start of the cc like this: `[^abc]` . That matches every char that is NOT
+`a` or `b` or `c`.
+
+A cc can contain meta-chars like: `[a-z\d]`, that match all the lowercase
+latin chars `a-z` and all the digits `\d`.
+
+It is possible to mix all the properties of the char class together.
+
+NB: In order to match the `-` (minus) char, it must be preceded by
+ a backslash in the cc, for example `[\-_\d\a]` will match:
+ `-` minus,
+ `_` underscore,
+ `\d` numeric chars,
+ `\a` lower case chars.
+
+### Meta-chars
+
+A meta-char is specified by a backslash, before a character.
+For example `\w` is the meta-char `w`.
+
+A meta-char can match different types of characters.
+
+* `\w` matches an alphanumeric char `[a-zA-Z0-9_]`
+* `\W` matches a non alphanumeric char
+* `\d` matches a digit `[0-9]`
+* `\D` matches a non digit
+* `\s` matches a space char, one of `[' ','\t','\n','\r','\v','\f']`
+* `\S` matches a non space char
+* `\a` matches only a lowercase char `[a-z]`
+* `\A` matches only an uppercase char `[A-Z]`
+
+### Quantifier
+
+Each token can have a quantifier, that specifies how many times the character
+must be matched.
+
+#### **Short quantifiers**
+
+- `?` matches 0 or 1 time, `a?b` matches both `ab` or `b`
+- `+` matches *at least* 1 time, for example, `a+` matches both `aaa` or `a`
+- `*` matches 0 or more times, for example, `a*b` matches `aaab`, `ab` or `b`
+
+#### **Long quantifiers**
+
+- `{x}` matches exactly x times, `a{2}` matches `aa`, but not `aaa` or `a`
+- `{min,}` matches at least min times, `a{2,}` matches `aaa` or `aa`, not `a`
+- `{,max}` matches at least 0 times and at maximum max times,
+ for example, `a{,2}` matches `a` and `aa`, but doesn't match `aaa`
+- `{min,max}` matches from min times, to max times, for example
+ `a{2,3}` matches `aa` and `aaa`, but doesn't match `a` or `aaaa`
+
+A long quantifier, may have a `greedy off` flag, that is the `?`
+character after the brackets. `{2,4}?` means to match the minimum
+number of possible tokens, in this case 2.
+
+### Dot char
+
+The dot is a particular meta-char, that matches "any char".
+
+It is simpler to explain it with an example:
+
+Suppose you have `abccc ddeef` as a source string, that you want to parse
+with a regex. The following table show the query strings and the result of
+parsing source string.
+
++--------------+-------------+
+| query string | result |
+|--------------|-------------|
+| `.*c` | `abc` |
+| `.*dd` | `abcc dd` |
+| `ab.*e` | `abccc dde` |
+| `ab.{3} .*e` | `abccc dde` |
++--------------+-------------+
+
+The dot matches any character, until the next token match is satisfied.
+
+### OR token
+
+The token `|`, means a logic OR operation between two consecutive tokens,
+i.e. `a|b` matches a character that is `a` or `b`.
+
+The OR token can work in a "chained way": `a|(b)|cd ` means test first `a`,
+if the char is not `a`, then test the group `(b)`, and if the group doesn't
+match too, finally test the token `c`.
+
+NB: ** unlike in PCRE, the OR operation works at token level!**
+It doesn't work at concatenation level!
+
+That also means, that a query string like `abc|bde` is not equal to
+`(abc)|(bde)`, but instead to `ab(c|b)de.
+The OR operation works only for `c|b`, not at char concatenation level.
+
+### Groups
+
+Groups are a method to create complex patterns with repetitions of blocks
+of tokens. The groups are delimited by round brackets `( )`. Groups can be
+nested. Like all other tokens, groups can have a quantifier too.
+
+`c(pa)+z` match `cpapaz` or `cpaz` or `cpapapaz` .
+
+`(c(pa)+z ?)+` matches `cpaz cpapaz cpapapaz` or `cpapaz`
+
+Lets analyze this last case, first we have the group `#0`, that is the most
+outer round brackets `(...)+`. This group has a quantifier `+`, that say to
+match its content *at least one time*.
+
+Then we have a simple char token `c`, and a second group `#1`: `(pa)+`.
+This group also tries to match the sequence `pa`, *at least one time*,
+as specified by the `+` quantifier.
+
+Then, we have another simple token `z` and another simple token ` ?`,
+i.e. the space char (ascii code 32) followed by the `?` quantifier,
+which means that the preceding space should be matched 0 or 1 time.
+
+This explains why the `(c(pa)+z ?)+` query string,
+can match `cpaz cpapaz cpapapaz` .
+
+In this implementation the groups are "capture groups". This means that the
+last temporal result for each group, can be retrieved from the `RE` struct.
+
+The "capture groups" are stored as indexes in the field `groups`,
+that is an `[]int` inside the `RE` struct.
+
+**example:**
+
+```v oksyntax
+text := 'cpaz cpapaz cpapapaz'
+query := r'(c(pa)+z ?)+'
+mut re := regex.regex_opt(query) or { panic(err) }
+println(re.get_query())
+// #0(c#1(pa)+z ?)+
+// #0 and #1 are the ids of the groups, are shown if re.debug is 1 or 2
+start, end := re.match_string(text)
+// [start=0, end=20] match => [cpaz cpapaz cpapapaz]
+mut gi := 0
+for gi < re.groups.len {
+ if re.groups[gi] >= 0 {
+ println('${gi / 2} :[${text[re.groups[gi]..re.groups[gi + 1]]}]')
+ }
+ gi += 2
+}
+// groups captured
+// 0 :[cpapapaz]
+// 1 :[pa]
+```
+
+**note:** *to show the `group id number` in the result of the `get_query()`*
+*the flag `debug` of the RE object must be `1` or `2`*
+
+In order to simplify the use of the captured groups, it possible to use the
+utility function: `get_group_list`.
+
+This function return a list of groups using this support struct:
+
+```v oksyntax
+pub struct Re_group {
+pub:
+ start int = -1
+ end int = -1
+}
+```
+
+Here an example of use:
+
+```v oksyntax
+/*
+This simple function converts an HTML RGB value with 3 or 6 hex digits to
+an u32 value, this function is not optimized and it is only for didatical
+purpose. Example: #A0B0CC #A9F
+*/
+fn convert_html_rgb(in_col string) u32 {
+ mut n_digit := if in_col.len == 4 { 1 } else { 2 }
+ mut col_mul := if in_col.len == 4 { 4 } else { 0 }
+ // this is the regex query, it use the V string interpolation to customize the regex query
+ // NOTE: if you want use escaped code you must use the r"" (raw) strings,
+ // *** please remember that the V interpoaltion doesn't work on raw strings. ***
+ query := '#([a-fA-F0-9]{$n_digit})([a-fA-F0-9]{$n_digit})([a-fA-F0-9]{$n_digit})'
+ mut re := regex.regex_opt(query) or { panic(err) }
+ start, end := re.match_string(in_col)
+ println('start: $start, end: $end')
+ mut res := u32(0)
+ if start >= 0 {
+ group_list := re.get_group_list() // this is the utility function
+ r := ('0x' + in_col[group_list[0].start..group_list[0].end]).int() << col_mul
+ g := ('0x' + in_col[group_list[1].start..group_list[1].end]).int() << col_mul
+ b := ('0x' + in_col[group_list[2].start..group_list[2].end]).int() << col_mul
+ println('r: $r g: $g b: $b')
+ res = u32(r) << 16 | u32(g) << 8 | u32(b)
+ }
+ return res
+}
+```
+
+Others utility functions are `get_group_by_id` and `get_group_bounds_by_id`
+that get directly the string of a group using its `id`:
+
+```v ignore
+txt := "my used string...."
+for g_index := 0; g_index < re.group_count ; g_index++ {
+ println("#${g_index} [${re.get_group_by_id(txt, g_index)}] \
+ bounds: ${re.get_group_bounds_by_id(g_index)}")
+}
+```
+
+More helper functions are listed in the **Groups query functions** section.
+
+### Groups Continuous saving
+
+In particular situations, it is useful to have a continuous group saving.
+This is possible by initializing the `group_csave` field in the `RE` struct.
+
+This feature allows you to collect data in a continuous/streaming way.
+
+In the example, we can pass a text, followed by an integer list,
+that we wish to collect. To achieve this task, we can use the continuous
+group saving, by enabling the right flag: `re.group_csave_flag = true`.
+
+The `.group_csave` array will be filled then, following this logic:
+
+`re.group_csave[0]` - number of total saved records
+`re.group_csave[1+n*3]` - id of the saved group
+`re.group_csave[1+n*3]` - start index in the source string of the saved group
+`re.group_csave[1+n*3]` - end index in the source string of the saved group
+
+The regex will save groups, until it finishes, or finds that the array has no
+more space. If the space ends, no error is raised, and further records will
+not be saved.
+
+```v ignore
+import regex
+fn main(){
+ txt := "http://www.ciao.mondo/hello/pippo12_/pera.html"
+ query := r"(?P<format>https?)|(?P<format>ftps?)://(?P<token>[\w_]+.)+"
+
+ mut re := regex.regex_opt(query) or { panic(err) }
+ //println(re.get_code()) // uncomment to see the print of the regex execution code
+ re.debug=2 // enable maximum log
+ println("String: ${txt}")
+ println("Query : ${re.get_query()}")
+ re.debug=0 // disable log
+ re.group_csave_flag = true
+ start, end := re.match_string(txt)
+ if start >= 0 {
+ println("Match ($start, $end) => [${txt[start..end]}]")
+ } else {
+ println("No Match")
+ }
+
+ if re.group_csave_flag == true && start >= 0 && re.group_csave.len > 0{
+ println("cg: $re.group_csave")
+ mut cs_i := 1
+ for cs_i < re.group_csave[0]*3 {
+ g_id := re.group_csave[cs_i]
+ st := re.group_csave[cs_i+1]
+ en := re.group_csave[cs_i+2]
+ println("cg[$g_id] $st $en:[${txt[st..en]}]")
+ cs_i += 3
+ }
+ }
+}
+```
+
+The output will be:
+
+```
+String: http://www.ciao.mondo/hello/pippo12_/pera.html
+Query : #0(?P<format>https?)|{8,14}#0(?P<format>ftps?)://#1(?P<token>[\w_]+.)+
+Match (0, 46) => [http://www.ciao.mondo/hello/pippo12_/pera.html]
+cg: [8, 0, 0, 4, 1, 7, 11, 1, 11, 16, 1, 16, 22, 1, 22, 28, 1, 28, 37, 1, 37, 42, 1, 42, 46]
+cg[0] 0 4:[http]
+cg[1] 7 11:[www.]
+cg[1] 11 16:[ciao.]
+cg[1] 16 22:[mondo/]
+cg[1] 22 28:[hello/]
+cg[1] 28 37:[pippo12_/]
+cg[1] 37 42:[pera.]
+cg[1] 42 46:[html]
+```
+
+### Named capturing groups
+
+This regex module supports partially the question mark `?` PCRE syntax for groups.
+
+`(?:abcd)` **non capturing group**: the content of the group will not be saved.
+
+`(?P<mygroup>abcdef)` **named group:** the group content is saved and labeled
+as `mygroup`.
+
+The label of the groups is saved in the `group_map` of the `RE` struct,
+that is a map from `string` to `int`, where the value is the index in
+`group_csave` list of indexes.
+
+Here is an example for how to use them:
+```v ignore
+import regex
+fn main(){
+ txt := "http://www.ciao.mondo/hello/pippo12_/pera.html"
+ query := r"(?P<format>https?)|(?P<format>ftps?)://(?P<token>[\w_]+.)+"
+
+ mut re := regex.regex_opt(query) or { panic(err) }
+ //println(re.get_code()) // uncomment to see the print of the regex execution code
+ re.debug=2 // enable maximum log
+ println("String: ${txt}")
+ println("Query : ${re.get_query()}")
+ re.debug=0 // disable log
+ start, end := re.match_string(txt)
+ if start >= 0 {
+ println("Match ($start, $end) => [${txt[start..end]}]")
+ } else {
+ println("No Match")
+ }
+
+ for name in re.group_map.keys() {
+ println("group:'$name' \t=> [${re.get_group_by_name(txt, name)}] \
+ bounds: ${re.get_group_bounds_by_name(name)}")
+ }
+}
+```
+
+Output:
+
+```
+String: http://www.ciao.mondo/hello/pippo12_/pera.html
+Query : #0(?P<format>https?)|{8,14}#0(?P<format>ftps?)://#1(?P<token>[\w_]+.)+
+Match (0, 46) => [http://www.ciao.mondo/hello/pippo12_/pera.html]
+group:'format' => [http] bounds: (0, 4)
+group:'token' => [html] bounds: (42, 46)
+```
+
+In order to simplify the use of the named groups, it is possible to
+use a name map in the `re` struct, using the function `re.get_group_by_name`.
+
+Here is a more complex example of using them:
+```v oksyntax
+// This function demostrate the use of the named groups
+fn convert_html_rgb_n(in_col string) u32 {
+ mut n_digit := if in_col.len == 4 { 1 } else { 2 }
+ mut col_mul := if in_col.len == 4 { 4 } else { 0 }
+ query := '#(?P<red>[a-fA-F0-9]{$n_digit})' + '(?P<green>[a-fA-F0-9]{$n_digit})' +
+ '(?P<blue>[a-fA-F0-9]{$n_digit})'
+ mut re := regex.regex_opt(query) or { panic(err) }
+ start, end := re.match_string(in_col)
+ println('start: $start, end: $end')
+ mut res := u32(0)
+ if start >= 0 {
+ red_s, red_e := re.get_group_by_name('red')
+ r := ('0x' + in_col[red_s..red_e]).int() << col_mul
+ green_s, green_e := re.get_group_by_name('green')
+ g := ('0x' + in_col[green_s..green_e]).int() << col_mul
+ blue_s, blue_e := re.get_group_by_name('blue')
+ b := ('0x' + in_col[blue_s..blue_e]).int() << col_mul
+ println('r: $r g: $g b: $b')
+ res = u32(r) << 16 | u32(g) << 8 | u32(b)
+ }
+ return res
+}
+```
+
+Other utilities are `get_group_by_name` and `get_group_bounds_by_name`,
+that return the string of a group using its `name`:
+
+```v ignore
+txt := "my used string...."
+for name in re.group_map.keys() {
+ println("group:'$name' \t=> [${re.get_group_by_name(txt, name)}] \
+ bounds: ${re.get_group_bounds_by_name(name)}")
+}
+```
+
+
+
+### Groups query functions
+
+These functions are helpers to query the captured groups
+
+```v ignore
+// get_group_bounds_by_name get a group boundaries by its name
+pub fn (re RE) get_group_bounds_by_name(group_name string) (int, int)
+
+// get_group_by_name get a group string by its name
+pub fn (re RE) get_group_by_name(group_name string) string
+
+// get_group_by_id get a group boundaries by its id
+pub fn (re RE) get_group_bounds_by_id(group_id int) (int,int)
+
+// get_group_by_id get a group string by its id
+pub fn (re RE) get_group_by_id(in_txt string, group_id int) string
+
+struct Re_group {
+pub:
+ start int = -1
+ end int = -1
+}
+
+// get_group_list return a list of Re_group for the found groups
+pub fn (re RE) get_group_list() []Re_group
+```
+
+## Flags
+
+It is possible to set some flags in the regex parser, that change
+the behavior of the parser itself.
+
+```v ignore
+// example of flag settings
+mut re := regex.new()
+re.flag = regex.F_BIN
+```
+
+- `F_BIN`: parse a string as bytes, utf-8 management disabled.
+
+- `F_EFM`: exit on the first char matches in the query, used by the
+ find function.
+
+- `F_MS`: matches only if the index of the start match is 0,
+ same as `^` at the start of the query string.
+
+- `F_ME`: matches only if the end index of the match is the last char
+ of the input string, same as `$` end of query string.
+
+- `F_NL`: stop the matching if found a new line char `\n` or `\r`
+
+## Functions
+
+### Initializer
+
+These functions are helper that create the `RE` struct,
+a `RE` struct can be created manually if you needed.
+
+#### **Simplified initializer**
+
+```v ignore
+// regex create a regex object from the query string and compile it
+pub fn regex_opt(in_query string) ?RE
+```
+
+#### **Base initializer**
+
+```v ignore
+// new_regex create a REgex of small size, usually sufficient for ordinary use
+pub fn new() RE
+
+```
+#### **Custom initialization**
+For some particular needs, it is possible to initialize a fully customized regex:
+```v ignore
+pattern = r"ab(.*)(ac)"
+// init custom regex
+mut re := regex.RE{}
+// max program length, can not be longer then the pattern
+re.prog = []Token {len: pattern.len + 1}
+// can not be more char class the the length of the pattern
+re.cc = []CharClass{len: pattern.len}
+
+re.group_csave_flag = false // true enable continuos group saving if needed
+re.group_max_nested = 128 // set max 128 group nested possible
+re.group_max = pattern.len>>1 // we can't have more groups than the half of the pattern legth
+re.group_stack = []int{len: re.group_max, init: -1}
+re.group_data = []int{len: re.group_max, init: -1}
+```
+### Compiling
+
+After an initializer is used, the regex expression must be compiled with:
+
+```v ignore
+// compile compiles the REgex returning an error if the compilation fails
+pub fn (re mut RE) compile_opt(in_txt string) ?
+```
+
+### Matching Functions
+
+These are the matching functions
+
+```v ignore
+// match_string try to match the input string, return start and end index if found else start is -1
+pub fn (re mut RE) match_string(in_txt string) (int,int)
+
+```
+
+## Find and Replace
+
+There are the following find and replace functions:
+
+#### Find functions
+
+```v ignore
+// find try to find the first match in the input string
+// return start and end index if found else start is -1
+pub fn (re mut RE) find(in_txt string) (int,int)
+
+// find_all find all the "non overlapping" occurrences of the matching pattern
+// return a list of start end indexes like: [3,4,6,8]
+// the matches are [3,4] and [6,8]
+pub fn (re mut RE) find_all(in_txt string) []int
+
+// find_all find all the "non overlapping" occurrences of the matching pattern
+// return a list of strings
+// the result is like ["first match","secon match"]
+pub fn (mut re RE) find_all_str(in_txt string) []string
+```
+
+#### Replace functions
+
+```v ignore
+// replace return a string where the matches are replaced with the repl_str string,
+// this function support groups in the replace string
+pub fn (re mut RE) replace(in_txt string, repl string) string
+```
+
+replace string can include groups references:
+
+```v ignore
+txt := "Today it is a good day."
+query := r'(a\w)[ ,.]'
+mut re := regex.regex_opt(query)?
+res := re.replace(txt, r"__[\0]__")
+```
+
+in this example we used the group `0` in the replace string: `\0`, the result will be:
+
+```
+Today it is a good day. => Tod__[ay]__it is a good d__[ay]__
+```
+
+**Note:** in the replace strings can be used only groups from `0` to `9`.
+
+If the usage of `groups` in the replace process, is not needed, it is possible
+to use a quick function:
+
+```v ignore
+// replace_simple return a string where the matches are replaced with the replace string
+pub fn (mut re RE) replace_simple(in_txt string, repl string) string
+```
+
+#### Custom replace function
+
+For complex find and replace operations, you can use `replace_by_fn` .
+The `replace_by_fn`, uses a custom replace callback function, thus
+allowing customizations. The custom callback function is called for
+every non overlapped find.
+
+The custom callback function must be of the type:
+
+```v ignore
+// type of function used for custom replace
+// in_txt source text
+// start index of the start of the match in in_txt
+// end index of the end of the match in in_txt
+// --- the match is in in_txt[start..end] ---
+fn (re RE, in_txt string, start int, end int) string
+```
+
+The following example will clarify its usage:
+
+```v ignore
+import regex
+// customized replace functions
+// it will be called on each non overlapped find
+fn my_repl(re regex.RE, in_txt string, start int, end int) string {
+ g0 := re.get_group_by_id(in_txt, 0)
+ g1 := re.get_group_by_id(in_txt, 1)
+ g2 := re.get_group_by_id(in_txt, 2)
+ return "*$g0*$g1*$g2*"
+}
+
+fn main(){
+ txt := "today [John] is gone to his house with (Jack) and [Marie]."
+ query := r"(.)(\A\w+)(.)"
+
+ mut re := regex.regex_opt(query) or { panic(err) }
+
+ result := re.replace_by_fn(txt, my_repl)
+ println(result)
+}
+```
+
+Output:
+
+```
+today *[*John*]* is gone to his house with *(*Jack*)* and *[*Marie*]*.
+```
+
+
+
+## Debugging
+
+This module has few small utilities to you write regex patterns.
+
+### **Syntax errors highlight**
+
+The next example code shows how to visualize regex pattern syntax errors
+in the compilation phase:
+
+```v oksyntax
+query := r'ciao da ab[ab-]'
+// there is an error, a range not closed!!
+mut re := new()
+re.compile_opt(query) or { println(err) }
+// output!!
+// query: ciao da ab[ab-]
+// err : ----------^
+// ERROR: ERR_SYNTAX_ERROR
+```
+
+### **Compiled code**
+
+It is possible to view the compiled code calling the function `get_query()`.
+The result will be something like this:
+
+```
+========================================
+v RegEx compiler v 1.0 alpha output:
+PC: 0 ist: 92000000 ( GROUP_START #:0 { 1, 1}
+PC: 1 ist: 98000000 . DOT_CHAR nx chk: 4 { 1, 1}
+PC: 2 ist: 94000000 ) GROUP_END #:0 { 1, 1}
+PC: 3 ist: 92000000 ( GROUP_START #:1 { 1, 1}
+PC: 4 ist: 90000000 [\A] BSLS { 1, 1}
+PC: 5 ist: 90000000 [\w] BSLS { 1,MAX}
+PC: 6 ist: 94000000 ) GROUP_END #:1 { 1, 1}
+PC: 7 ist: 92000000 ( GROUP_START #:2 { 1, 1}
+PC: 8 ist: 98000000 . DOT_CHAR nx chk: -1 last! { 1, 1}
+PC: 9 ist: 94000000 ) GROUP_END #:2 { 1, 1}
+PC: 10 ist: 88000000 PROG_END { 0, 0}
+========================================
+
+```
+
+`PC`:`int` is the program counter or step of execution, each single step is a token.
+
+`ist`:`hex` is the token instruction id.
+
+`[a]` is the char used by the token.
+
+`query_ch` is the type of token.
+
+`{m,n}` is the quantifier, the greedy off flag `?` will be showed if present in the token
+
+### **Log debug**
+
+The log debugger allow to print the status of the regex parser when the
+parser is running. It is possible to have two different levels of
+debug information: 1 is normal, while 2 is verbose.
+
+Here is an example:
+
+*normal* - list only the token instruction with their values
+
+```ignore
+// re.flag = 1 // log level normal
+flags: 00000000
+# 2 s: ist_load PC: i,ch,len:[ 0,'a',1] f.m:[ -1, -1] query_ch: [a]{1,1}:0 (#-1)
+# 5 s: ist_load PC: i,ch,len:[ 1,'b',1] f.m:[ 0, 0] query_ch: [b]{2,3}:0? (#-1)
+# 7 s: ist_load PC: i,ch,len:[ 2,'b',1] f.m:[ 0, 1] query_ch: [b]{2,3}:1? (#-1)
+# 10 PROG_END
+```
+
+*verbose* - list all the instructions and states of the parser
+
+```ignore
+flags: 00000000
+# 0 s: start PC: NA
+# 1 s: ist_next PC: NA
+# 2 s: ist_load PC: i,ch,len:[ 0,'a',1] f.m:[ -1, -1] query_ch: [a]{1,1}:0 (#-1)
+# 3 s: ist_quant_p PC: i,ch,len:[ 1,'b',1] f.m:[ 0, 0] query_ch: [a]{1,1}:1 (#-1)
+# 4 s: ist_next PC: NA
+# 5 s: ist_load PC: i,ch,len:[ 1,'b',1] f.m:[ 0, 0] query_ch: [b]{2,3}:0? (#-1)
+# 6 s: ist_quant_p PC: i,ch,len:[ 2,'b',1] f.m:[ 0, 1] query_ch: [b]{2,3}:1? (#-1)
+# 7 s: ist_load PC: i,ch,len:[ 2,'b',1] f.m:[ 0, 1] query_ch: [b]{2,3}:1? (#-1)
+# 8 s: ist_quant_p PC: i,ch,len:[ 3,'b',1] f.m:[ 0, 2] query_ch: [b]{2,3}:2? (#-1)
+# 9 s: ist_next PC: NA
+# 10 PROG_END
+# 11 PROG_END
+```
+
+the columns have the following meaning:
+
+`# 2` number of actual steps from the start of parsing
+
+`s: ist_next` state of the present step
+
+`PC: 1` program counter of the step
+
+`=>7fffffff ` hex code of the instruction
+
+`i,ch,len:[ 0,'a',1]` `i` index in the source string, `ch` the char parsed,
+`len` the length in byte of the char parsed
+
+`f.m:[ 0, 1]` `f` index of the first match in the source string, `m` index that is actual matching
+
+`query_ch: [b]` token in use and its char
+
+`{2,3}:1?` quantifier `{min,max}`, `:1` is the actual counter of repetition,
+`?` is the greedy off flag if present.
+
+### **Custom Logger output**
+
+The debug functions output uses the `stdout` as default,
+it is possible to provide an alternative output, by setting a custom
+output function:
+
+```v oksyntax
+// custom print function, the input will be the regex debug string
+fn custom_print(txt string) {
+ println('my log: $txt')
+}
+
+mut re := new()
+re.log_func = custom_print
+// every debug output from now will call this function
+```
+
+## Example code
+
+Here an example that perform some basically match of strings
+
+```v ignore
+import regex
+
+fn main(){
+ txt := "http://www.ciao.mondo/hello/pippo12_/pera.html"
+ query := r"(?P<format>https?)|(?P<format>ftps?)://(?P<token>[\w_]+.)+"
+
+ mut re := regex.regex_opt(query) or { panic(err) }
+
+ start, end := re.match_string(txt)
+ if start >= 0 {
+ println("Match ($start, $end) => [${txt[start..end]}]")
+ for g_index := 0; g_index < re.group_count ; g_index++ {
+ println("#${g_index} [${re.get_group_by_id(txt, g_index)}] \
+ bounds: ${re.get_group_bounds_by_id(g_index)}")
+ }
+ for name in re.group_map.keys() {
+ println("group:'$name' \t=> [${re.get_group_by_name(txt, name)}] \
+ bounds: ${re.get_group_bounds_by_name(name)}")
+ }
+ } else {
+ println("No Match")
+ }
+}
+```
+Here an example of total customization of the regex environment creation:
+```v ignore
+import regex
+
+fn main(){
+ txt := "today John is gone to his house with Jack and Marie."
+ query := r"(?:(?P<word>\A\w+)|(?:\a\w+)[\s.]?)+"
+
+ // init regex
+ mut re := regex.RE{}
+ // max program length, can not be longer then the query
+ re.prog = []regex.Token {len: query.len + 1}
+ // can not be more char class the the length of the query
+ re.cc = []regex.CharClass{len: query.len}
+ re.prog = []regex.Token {len: query.len+1}
+ // enable continuos group saving
+ re.group_csave_flag = true
+ // set max 128 group nested
+ re.group_max_nested = 128
+ // we can't have more groups than the half of the query legth
+ re.group_max = query.len>>1
+
+ // compile the query
+ re.compile_opt(query) or { panic(err) }
+
+ start, end := re.match_string(txt)
+ if start >= 0 {
+ println("Match ($start, $end) => [${txt[start..end]}]")
+ } else {
+ println("No Match")
+ }
+
+ // show results for continuos group saving
+ if re.group_csave_flag == true && start >= 0 && re.group_csave.len > 0{
+ println("cg: $re.group_csave")
+ mut cs_i := 1
+ for cs_i < re.group_csave[0]*3 {
+ g_id := re.group_csave[cs_i]
+ st := re.group_csave[cs_i+1]
+ en := re.group_csave[cs_i+2]
+ println("cg[$g_id] $st $en:[${txt[st..en]}]")
+ cs_i += 3
+ }
+ }
+
+ // show results for captured groups
+ if start >= 0 {
+ println("Match ($start, $end) => [${txt[start..end]}]")
+ for g_index := 0; g_index < re.group_count ; g_index++ {
+ println("#${g_index} [${re.get_group_by_id(txt, g_index)}] \
+ bounds: ${re.get_group_bounds_by_id(g_index)}")
+ }
+ for name in re.group_map.keys() {
+ println("group:'$name' \t=> [${re.get_group_by_name(txt, name)}] \
+ bounds: ${re.get_group_bounds_by_name(name)}")
+ }
+ } else {
+ println("No Match")
+ }
+}
+```
+
+More examples are available in the test code for the `regex` module,
+see `vlib/regex/regex_test.v`.