Complex regular expressions
Support of the interface between a script and regular expressions is carried out through the following functions: split (), ereg (), ereg_replace (). (dop. The editor). The first argument for all three functions is the line specifying regular expression. This the line will consist of regular and special symbols. Regular symbols have the same value as at in other Unix commands while special symbols have special value. Further sledujuet - the full list of special symbols and their values as it perceives parser PHP:
. ' Is a special symbol which corresponds{meets} to any symbol, except for a symbol of a new line. Using konkatenaciju, we can set regular expressions is similar 'a.b' which corresponds{meets} to any three-symbolical line which begins with 'a' and comes to an end 'b'.
* ' In itself it not a design; it - the suffix which means, that previous regular expression, can be repeated as is wished many times. In line " fo * ", the symbol "*" is applied to a symbol " o ' so " fo * " sets "f" with subsequent any quantity{amount} of symbols "o".
In case of zero quantity{amount} of symbols "o" the line " fo * " will correspond{meet} also "f".
The symbol "*" Always applies to *naimen`shemu* to possible{probable} previous expression. Thus, " fo * " sets recurrence "o", instead of recurrence "fo".
Process sravnenijar processes a design "*", trying to coordinate so it is a lot of recurrences as far as much they them can be found. Then he continues processing other part of a pattern. If, subsequently, will appear nesootvetsvtie with shablogam, there is a return, by rejection of some recurrences "*" in case it makes possible concurrence of other part of a pattern. For example, the pattern " c [ad] *ar " for a line "caddaar", " [ad] * " all over again coincides with "addaa", but it does not allow to coincide to the following symbol "a" in a pattern. So last concurrence "[ad]" otmenjautsja, and the following symbol "a" is tried again. Now a pattern soovetstvuet.
+ ' "+" It is similar "*" except that one conformity for a previous sample is required at least. Thus, " c [ad] +r " does not coincide with "cr", but will coincide with what or still that can be set by a pattern " c [ad] *r ".
? ' "?" It is similar "*" except that allows to set zero or more conformity for the set pattern. Thus, a pattern " c [ad]? r " will set lines "cr" or "car" or "cdr", and it is more than anything.
[...] ' "[" begins " set of symbols " which comes to the end with a symbol "]". In the most simple case, symbols between these two brackets form set. Thus, "[ad]" sets symbols "a" or "d", and " [ad] * " sets ljubouju sequence of symbols "a" and "d" (switching and an empty line) from what follows, that the pattern " c [ad] *r " sets "car", etc.
The range of symbols also can be switched on in set of symbols, with the help of a symbol "-", placed between two others. Thus, the pattern "[a-z]" sets any symbol of the bottom register. Ranges can freely alternate with single symbols, as in a pattern " [a-z $ %.] " which sets any symbol of the bottom register or symbols "$", "%" or a point.
Pay attention, that the symbols usually being special, inside set of symbols any more are not those. Inside set of symbols there is completely an excellent{a different} set of special symbols: "]", "-" i "^".
To switch on "]" in set of symbols, it is necessary to make his first symbol. For example, the pattern " [] a] " sets a symbol "]" or "a". To switch on a symbol "-", it is necessary to use it in such context where he cannot specify a range: that is or the first symbol, or right after a range.
[^...] ' " [^ " begins " excluding set of symbols " which sets any symbol except for set. Thus, the pattern " [^a-z0-9A-Z] " sets any symbol *za iskljucheniem* letters and figures. "^" it is not a special symbol in set, if only not the first symbol. The symbol the following after "^" is processed as if he is the first (it can be "-" or "]").
^ ' Is a special symbol which sets an empty line - but only in a case if he costs{stands} in the beginning of a line of a pattern. Otherwise the pattern will not correspond{meet}. Thus, the pattern "^foo" sets "foo" in the beginning of a line.
$ ' It is similar "^", but only sets the end of a line. So the pattern, " xx * $ " sets a line with one or more symbol "x" at the end of a line.
\ ' Has two values: shields the set forth above special symbols (switching "\"), and sets additional special designs.
As "\" shields special symbols, " \ $ " is the regular expression specifying only a symbol "$", and " \ [" is the regular expression specifying only "[", and so on.
Basically, "\" with subsequent any symbol corresponds{meets} only to this symbol. However, there are some exceptions: symbols, which when "\" the special design precedes. Such symbols usually always set their own value.
Any new special symbols are not determined. All expansions to syntax of regular expressions are made, definition new two-symbolical designs which begin with "\".
\ | ' Sets alternative. Two regular expressions A and B with " \ | " between them form expression which sets something to that corresponds{meets} or And or B.
So expression, " foo \ | bar " either "foo" or "bar", but any other line.
" \ | " it is applied to maximum big surrounding expressions. Only " \ (... \) " around of expressions can limit capacity " \ | ".
There is a full opportunity perebora with returns when the set " \ | " is set.
\ (... \) ' is a design of grouping which serves three purposes:
1. To conclude in itself set " \ | " alternatives of other operations. So, the pattern " \ (foo \ | bar \) x " corresponds{meets} either "foox" or "barx".
2. To include complex expression for postfiksnogo "*". So the pattern " ba \ (na \) * " sets "bananana", etc., with any (a zero or boleee) quantity{amount} "na".
3. To note required podstroku for the subsequent reference{manipulation}.
This last function - not consequence{investigation} of idea concerning a grouping of expressions by brackets; it - separate feature, which sets the second value for the same design " \ (... \) " as there is no practically any conflict between these two values. An explanation of this feature:
\DIGIT ' After the termination{ending} of a design " \ (... \) ", the analyzer remembers the beginning and the end of the text which has been concurrent to this design. Then, later in regular expression it is possible to use "\" with poledujuhhej in figure (DIGIT), that means " to set the same text, which sootvetstvovuet DIGIT to a presence{finding} in a design ' \ (... \) ' ". " \ (... \) " designs are numbered in ascending order in regular expression.
To lines specifying first nine designs " \ (... \) ", appearing in regular expression - correspond{meet} numbers from 1 up to 9. " \1 " up to "\9" can be used for the reference{manipulation} to the text, corresponding " \ (... \) " designs. These 9 saved designs are known as well as registers.
For example, a pattern " \ (.* \)\1 " sets any line which will consist of two identical parts. " \ (.* \) " sets the first part which can be all everything, but the subsequent "\1" sets precisely same to the text.
The saved designs or registers can be used inside single expressions, or, they can be taken and be used somewhere else. Addition of the third parameter to reg_match () or reg_search () will define{determine} a file in which 9 registers will be written down. Thus the additional register (a zero element) in which line concurrent with all expression enters the name is set. For example:
<? $string = " This is a test "; $cnt = reg_match (" \ (\w* \).* \ (\ ") echo $cnt; echo
$regs [0]; echo $regs [1]; echo $regs [2];>
Above mentioned all over again will print quantity{amount} of the concurrent symbols (14 in this case) and then all concurrent line, sposledujuhhim with the first word of a line and the last.
\b ' Sets an empty line, but only if she is in the beginning or at the end of a word. Thus, "\bfoo\b" corresponds{meets} to any site "foo" as a separate word. " \bball \ (s \ | \)\b " corresponds{meets} "ball" or "balls" as separate words.
\B ' Sets an empty line, if she not in the beginning or not at the end of a word.
\ <' Sets an empty line, but only, if she - in the beginning of a word.
\> ' Sets an empty line, but only, if she at the end of a word.
\w ' Sets any symbol being a component of a word.
\W ' Sets any symbol which - is not a component of a word.
--------------------------------------------------------------------------------
With files and lines Today I shall tell the basic functions of job to you about regular expressions, and also about the main functions of job with lines and files. In this section you will meet functions. With the help of these functions it is possible to make replacement of the certain elements of a line, to carry out search in line, to work with the set patterns and many other things. Not so it is a lot of functions, but them the certain difficulties can represent some them at job as have set of various parameters. Today I shall acquaint you with key parameters which allow to make the main actions. So, we shall consider these functions under the order.
$s = implode ($a, $c); we have already met this function in the last release. She allows to connect all elements of a file in one line. Here $s - a line in which the result will be placed, $a - a file, $c - a pattern. The pattern is a character set for sklejki lines. This set will be inserted between all elements of a file. For example, we have such file:
$a [0] = "String1"; $a [1] = "String2"; $a [2] = "String3";
Accordingly, function implode ($a, "*") will return to us a line " String1 *** String2 *** String3 ".
$a = explode ($c, $s); Function explode is return implode. She breaks a line $s using a pattern $c and places elements in a file $a. For example, if to take a line "String1*String2*String3" and to execute function $a = explode ("*", $s) we shall receive such file:
$a [0] = "String1"; $a [1] = "String2"; $a [2] = "String3";
$a = split ($c, $s); Job funcii is absolutely identical explode, behind that exception, that in her it is possible to use regular expressions. It means, that already it is impossible to use for simple breakdown of a line a symbol "*" as he is regular expression (see section above). Therefore for breakdown of a line it is possible to use kakij-nibud` other symbol, for example, "~".
ereg ($c, $s); Function ereg returns true if in line $s conformity to regular expression $c. $c here is found is any set of the regular expressions described in the previous{last} section. For example, we have a line $s = " Here is testing string ". Function ereg (" ^Here. * ", $ s) will return true as in regular expression it is underlined, that word Here should be in the beginning of a line (spec. The symbol "^" specifies it) and after that words can go any symbols (a design ". * "). An example of the program which checks this conformity:
<? $s = " Here is testing string "; if (ereg (" ^Here. * ", $ s)) echo " It is found! ";
else echo " it is not found. ";?>
And a small example which searches for a pattern in any part of a word: <? $s = " Here is testing string "; if (ereg (" .*testing. * ", $ s)) echo " It is found! "; else echo " it is not found. ";?>
$s = ereg_replace ($c, $c1, $s); This function replaces all symbols in line $s, suitable under regular expression $c with symbols $c1. an example in which we replace all figures in line with signs "+":
<? $s = " 1 Here 2 is 3 testing 4 string 5 "; $s = ereg_replace ("[0-9]", "+", $s);
echo $s;?>
As you can see, function returns result in the set variable. $s = str_replace ($c, $c1, $s); Job of function is similar ereg_replace, behind that exception, that in parameter $c it is impossible to use regular expressions. This function can be used, when you do not have complex pattern for replacement, and it is necessary to make simple search and replacement of several symbols. For example, function $s = str_replace ("*", "+", "Str1*Str2*Str3") will replace in the set line all symbols "*" with symbols "+".
