Strings
A string is a sequence of characters. A string filter is a filter whose value is a string. A literal string is a string enclosed in quotation marks. For example,"rooks" is a literal string with 5 characters.
As of CQL 6.1, strings are first-class datatypes. They can be assigned to variables, returned as the result of functions, compared, and used as arguments to sort.
Strings can be compared for equality using == and != just like other data types.
If x and y are strings, then x + y is their concatentation:
"pin" + "mate" == "pinmate"
Strings can be compared using <= , < , >= , > using alphabetical order:
"The file h1" > "The file H1"
"" < "a"
"A" < "a"
Strings can be stored in variables like integers or sets of squares; they can be the result of another CQL expression; they can passed to functions:
Y=1
X= if (Y>0) "check" else "mate"
X ≡ "check"
table of filters manipulating strings
The following filters handle strings specifically:| Name | Use | Example |
|---|---|---|
| ~~ | regular expression matching | player ~~ "Ka.*ov" |
| \i | get a capturing group | \2 =="4a" |
| \-i | index of a capturing group | \-2 ==4 |
| # | length of a string | #"pin"==3 |
| + | concatenate strings | "x"+"y"=="xy" |
| ascii | conversion from to and from ASCII | ascii "A"==65ascii 65=="A" |
| currenttransform | current transform | message currenttransform |
| date player event eventdate site eco | specified PGN field | player white == "Kasparov"player~~"K.*v"date~~"1943\.0[1-6]"sort event |
| originalcomment | the comment in the PGN file | originalcomment ~~ "Eval: (\d+)" |
| dictionary | store and retrieve strings | dictionary D["hi"]="bye" |
| fen | get FEN of current position as a string | Y=fen |
| in | substring | "et" in "Reti" |
| indexof | index of a substring | indexof ("n" "pin")==2 |
| int | convert string to int | int "23"==23 |
| lowercase | convert string to lowercase | lowercase "Tal"=="tal" |
| makesquare | convert string to square | a3==makesquare "a3" |
| max min | max or min of its arguments | x=max("a" "b") |
| readfile | read a string from a file | X=readfile "cook.cqo" |
| settag | set value of PGN tag | settag("CustomTag" "Troitzky") |
| sort | sort string filters | sort player white |
| sort by string | sort by a string value | sort date |
| str | convert arguments to string | str("X is: " X) |
| tag | get a PGN tag value | tag "CustomTag"=="value" |
| uppercase | convert string to uppercase | uppercase "Tal"=="TAL" |
| writefile | write a string to a file | writefile("cook.cqo" X) |
Predefined strings
There are special predefined strings:\n is string consisting of the linefeed character; \t is the tab character; \" is the quote character; \r is the carriage return character; \\ is the backslash character. Note that these predefined strings are not specially interpreted inside quoted strings (although they may be specially interpreted when used inside regular expressions with ~~ ):
message ("The value of x is: " \n x)
y = "pin" + \n + "mate"
#("pin" + \n) ≡ 4
#"pin\n" ≡ 5
"pin\n"[3]== \\
"pin"[4]=="n"
These predefined strings are treated the same as quoted strings and are considered to be string literals.
Capturing groups
Ifi is a literal nonegative integer, then \i can refer to the i'th capturing group after a ~~ operation; \0 refers to the entire matched sequence of characters. See capturing groups in ~~ for more information. For example
"hello23"~~"ello(\d+)"
\0 ≡ "ello23"
\1 ≡ 23
indexing into strings
Strings are zero-indexed: the first character is at character position 0. Supposei is a non-negative integer and x is a string.
If i < #x then x[i] is the character (that is, the length-1 string) at index i. If i >= #x then x [i] fails to match the position.
If i is negative, then it is first converted into #x + i and then the above rules are used. Thus, x[-1] is the last character of x (or it fails to match if x has length 0). Similarly, x[-2] is the next-to-last character of x unless x has fewer than 2 characters, in which case it fails to match.
More formally,
an expression of the form x[i] for a string x simply matches if x and i each match the current position and
i is nonnegative and less than #x.
An expression x[i] matches the current position
whenever either
x[i]
simply matches or
x[#x+i]
simply matches.
An expression of the form x[m : n]
matches the position whenever x, m and n match the position.
"hello"[0]=="h"
"hello"[4]=="o"
"hello"[-1]=="o"
"hello"[-2]=="l"
"hello"[5] // false; does not match position
"hello"[-100] // false; does not match
("hello" + "goodbye")[5]=="g"
("hello" + "goodbye")[#"hello"+3]=="d"
"filename.cql"[-4:] == ".cql"
If m and n are nonnegative integers, and x is a string then
x[m:n]is the string consisting of those characters of
x whose indices lie between m and n-1 inclusive .
If m is missing, it is taken to be 0. If n is missing, it is taken to be #x. If either m or n is negative, it is replaced non-recursively by #x + m; likewise with n. Thus, x[:5] are the first 5 characters of x :
"mate"[0:2] == "ma" "mate"[1:2] == "a" "mate"[1:100]== "ate" "mate"[1:1]== "" "mate" [1:-1]== "at" "mate" [-2:-1] == "t" "mate" [2:1]== ""
Assignment of strings
Strings can be assigned just like numbers:x="a" x ≡ "a" x+="b" x ≡ "ab"
Indexed strings (when the string being indexed is a variable) can also be assigned.
x="a" // x is "a" x[0]="b" // x is now "b" x[0]="hello" // x is now "hello" x[-2]="c" // x is "helco" x[5]="z" // expression fails to match; x is still "helco"
String ranges (when the index expression contains :) can similarly be assigned, and can be used to prepend or append to strings:
x="a" // x is "a" x[0:0]="b" // x is "ba" x[2:2]="This" // x is "bahis" x[-3:-1]="HEY" // x is "baHEYs" x[2:4]="Z" // x is "baZYs" x[:2]="VV" // x is "VVZYs" x[2:]="" // x is "VV"
Performance notes when dealing with long strings
CQL is not particularly efficient when dealing with long strings, and does not generally support strings of more than a billion characters at all. CQL 6.1 sometimes makes unnecessary copies of string subexpressions, which can hurt performance when dealing with long strings. To avoid extra copies, use+= rather than + for appending to a variable, and in general try to keep strings in variables.
(CQL can manipulate strings longer than a billion characters so long as the length of the string is never evaluated and the string is never indexed into; this technique, however, is not supported. For example, a multigigabyte file can be read using readfile and then each line parsed using
BigFile=readfile "bigfile.pgn"
while (BigFile~~.*){
Line=\0 ...}
)
The warnings in this section apply only to long strings either generated in a CQL loop or read from readfile. For the kinds of strings typically found in pgn files - comments, tag values and so on, the issues discussed in this section do not arise.