The ~~ operator

The ~~ binary operator is used to determine whether a string matches a regular expression.

The left hand side of the ~~ filter is a string filter whose value is the string to search within, the target.

The right hand side of the ~~ filter is a quoted regular expression, the pattern.

The ~~ operator matches the position if the target matches the pattern. For example, each of the following filters match the current position:

   "football" ~~ "f" 
   "football" ~~ "f.*l"
   "football" ~~ "[otba]+ll"

Suppose the player playing White in the current game is Kasparov. Then:

  player white ~~ "Kasparov"
  player white ~~ "K.*ov"

To check if either Kotov or Kasparov is playing white or black, one could use:

  flipcolor player white ~~ "K(ot|aspar)ov"

or more simply

  player ~~ "K(ot|aspar)ov"

Regexes can be used in this to query the result of any filter returning a string:

  event ~~ "Wijk .* Zee"
  date ~~ "2004\.03\."
  site ~~ "Bel.*m"

Value of ~~ operator

The value of the ~~ operator is the matched string, that is, the sequence of characters in the target that matched the regular expression:
  Result= "football" ~~ ".*"
  Result == "football"
  Result2= "football" ~~ "otb"
  Result2 == "otb"
  Result3 = "football" ~~ "[otba]+"
  Result3 == "ootba"

Note that a value can be an empty string, which is different from failing to match.


  RR= "hello" ~~ "z*" //this matches
  RR=="" // the value of RR is the empty string
  "hello" ~~ "z+" // this filter fails to match

Group captures

The ~~ filter sets the values \0, \1, \2 and so on to denote the value of the regex capturing group, if any. \0 is the matched string. \1 is the first capturing group, and so on:

  "football" ~~ "(o+)tba(l+)"
  \0 == "ootball"
  \1 == "oo"
  \2 == "ll"

index of a capturing group

If \i is a capturing group, then \-i is the index (zero-based) within the target string at which this capturing group is located:

  "football" ~~ "(o+)tba(l+)"
  \-0 == 1
  \-1 == 2
  \-2 == 6

For getting the index of a string inside another string more generally, use indexof.

Extracting numbers using ~~

You can use ~~ to extract numbers from strings using the int filter. For example, suppose you have a string that contains among other things a substring "Eval: 43" where 43 is any number. You can get that value as follows:

  Target= "Blunder: Eval: 43"
  Target ~~ "Eval: (\d+)"
  Val = int \1  
The variable Val will have value 43. If Target had no such matching substring, the ~~ would not have matched and Val would not be changed

Using ~~ with while

~~ is treated specially when used as the test of a while filter (using a syntax borrowed from Perl):

  while (lhs ~~ regex) body

Here, lhs is a string filter; regex is a quoted string; body is any filter.

Initially, lhs will be evaluated to get a string, the target . The regular expression regex will be successively matched from left to right across the string, with body being evaluated after each match.

This kind of while filter will match any position, unless the lhs failed to match.

Let's call a "square string" a two-character string denoting a square, like "a4".

For example, this function counts the number of square strings in a string:

  function CountSquares(Arg){
   NumSquares //return number of square strings

We could apply this function to different strings:

  CountSquares("No squares")==0
  CountSquares("One c6 square")==1

Suppose we wanted to count the number of distinct square strings in a string.

The makesquare filter can take a single string as an argument and return a square. If we | all these squares together and count the number of squares in the result, we will get the number of distinct squares:

  function CountDistinctSquares(Arg){
     Squares=~. //the empty set

while(Arg~~"[a-h][1-8]") Squares |= makesquare \0

#Squares }

Note how \0 above refers to the currently matched string, in this case, the two-character string denoting a single square.

  CountDistinctSquares("Two: a2a1a1a2") == 2

For another example of the use of while with ~~, see ~~ form of while.


The ~~ filter has higher precedence than + :
  X+ (Y ~~ "tba") == "tba" // false
  (X+Y) ~~ "tba" == "tba" //true
  X+Y ~~ "tba" == "tba" //true, same as above

(As usual, we recommend using parentheses or braces to clarify the meaning when in doubt about precedence.)

Matching multiline targets

There are a few special considerations involved in matching multiline strings.

If the target does not contain the newline character, then ^ matches the beginning of the target and $ matches the end of the target. If the target contains the newline character, then on some platforms ^ matches the beginning of the line while $ matches the end of the line. Unfortunately, we do not know when this inconsistency will be fixed.

Note that . in the pattern never matches a newline. Generally, to match a line of characters in a platform-independent way, one can use something like:

  Lines="pin" + \n + "mate" + \n + "1-0" + \n
  while (Lines~~".*"){
    // Now the variable CurrentLine holds the current line,
    // without the trailing \n

Also note that in typical Windows usages end of lines are indicated by the two characters \r and \n. (On Linux and Mac, just \n is used). This is unlikely to cause much confusion in practice, but Windows users should be aware of the issue if parsing multiline strings.

Matching quotation marks characters

To search for a regular expression which contains the character ", use \x22:
  Target = "Tal said: " + \" + "mate" + \"
  Target ~~ "\x22mate\x22"
In the above example, the string Target has the value
  Tal said: "mate"

This because, standing alone, the two-character sequence \" stands for a quotation mark in CQL. However, that sequence cannot currently be embedded inside a longer string literal. Therefore, the hexadecimal value of the quotation mark must be used to search for it as a regular expression.


The fen filter documentation shows how to use the ~~ filter to parse FEN strings.