This page describes the API, i.e. the interface which is used to communicate with libkombilo. More precisely, I should say it will descripe the API - right now it is still very incomplete. In addition to the description given here, you will probably be able to grasp how things work by looking at the test programs in the repository (cpptest.cpp, process.py, testsearch.py, testhash.py)
Creating the game list
Create the GameList by
gl = GameList(database, orderby, format, process_options, cache)
Here, database should be a file name (including the full path). libkombilo will create 3 database files, named database, database + '1' and database + '2'. For instance, if database is /home/ug/kombilo/t1.db, then /home/ug/kombilo/t1.db, /home/ug/kombilo/t1.db1, /home/ug/kombilo/t1.db2 will be created (or re-opened, if the database already exists).
orderby and format are explained below. process_options is a pointer to a ProcessOptions class. This is used to set some global options when the database is created. Whenever you reopen an existing database, the process_options variable will be ignored (and you can/should pass 0 or omit this argument) - this means that the process_options are set once and for all.
cache is an integer determining the size of the database cache. The default value is 100 (you can use the default by omitting this parameter).
The GameList may contain games with different board sizes and will manage this by itself. This is largely untested, however.
Process options
You create an instance of this class by using the default constructor ProcessOptions() which will set reasonable default values. You can then change these values manually, if you want. Here are the relevant options
- rootNodeTags
- This is a string containing a comma-separated list of the SGF tags which should be extracted from the files to the database. The default value is BR,CA,DT,EV,HA,KM,PB,PC,PW,RE,RO,RU,SZ,US,WR.
- sgfInDB
- Determines whether the complete sgf file will be stored in the database. (Default: true.)
- algos
- This option determines which algorithms will be available for pattern searches. It has the form of a bitmask. A minimal setting (processing is very fast, searching not so fast) is ALGO_FINALPOS | ALGO_MOVELIST . Usually you will want to use ALGO_FINALPOS | ALGO_MOVELIST | ALGO_HASH_FULL | ALGO_HASH_CORNER, which is the default value - this enables all the algorithms which are currently available (more algorithms to come soon).
- processVariations
- A boolean which says whether variations in games should be 'processed' (and hence will be available for pattern searches), or should be ignored. (This default value can be overridden in {{{start_processing} in order to enable variations for part of the database.) Default: true.
- algo_hash_full_maxNumStones
- Used to fine-tune the ALGO_HASH_FULL algorithm. Positions with more than algo_hash_full_maxNumStones will not be stored in the hashing table. A reasonable value seems to be something around 50 (the default value). For positions with more stones, the ALGO_FINALPOS algorithm is usually sufficiently fast anyway.
- algo_hash_corner_maxNumStones
- Same for ALGO_HASH_CORNER. Default: 20.
Format string
The format string is a template for the game information string stored for the entries of the game list. It contains place holders of the form [[column name]] , where 'column name', obviously, is the name of a column of the database.
Currently, the columns in the database are
| id | the ID within the database (a positive integer) |
| path, filename | path and file name of the corresponding sgf file |
| filename. | the file name without the suffix '.sgf' or '.mgt' |
| pos | the position of the game inside the sgf file (=0 unless the sgf file is a collection of several games |
| date | date in format YYYY-MM-DD; see below |
| BR, CA, DT, EV, HA, KM, PB, PC, PW, RE, RO, RU, SZ, US, WR | SGF properties from the root node of the game, the most important ones are PW, PB (white, black player), RE (result), DT (date). This list can be modified when the GameList is first created. See Process Options above. |
| winner | not a database column, but available as a shortcut for the first letter of the re column, if this first letter is B (black win), W (white win) or J (jigo). Otherwise winner will be set equal to '-' |
Currently there is not much error-checking, and the column names are case-sensitive. You must not use [[ at other places than indicating column names as described above. If the format string is empty, the example given below will be used.
Example:
[[pw]] - [[pb]] ([[winner]]), [[dt]],
With this format string, a typical entry in the game list would be
Cho Chikun - O Rissei (W), 2005-04-20,21,
If the database contains hits from a pattern search, then currentEntriesAsStrings() returns the concatenation of the game info string as above and the list of hits.
Orderby
Orderby is a string which determines how the entries of the game list are ordered; its value can be any column name of the games database, for instance "pw" or "pb" (sort by name of black or white player), or "dt" or "date".
If orderby is the empty string, or equal to "id", the list is sorted by ID, i.e. in the order in which the games were inserted into the database. This is the fastest option, so it pays off to insert all your games into the database in the order you usually want to work with.
You can use several sort criteria by giving a comma-separated list of column names, e.g. "PW,PB,DATE". The ID will always be the final sort criterion which determines the order in case all other values are equal.
You can sort the items in descending order by adding "desc" to the corresponding criterion, for instance you could write "DATE DESC, PW, PB".
(As the experts will suspect, the orderby string is just appended to the corresponding SQL query.)
The difference between DT and DATE
DT is the date as given in the SGF file, DATE is always in format YYYY-MM-DD
Upon processing the SGF file, the program tries to extract DATE from the DT entry. For instance the entry in the sgf file might be "Published on 1960-01-01", in which case DATE will be "1960-01-01". The currently employed algorithm is not perfect, but is - I hope - a reasonable approximation.
It usually makes more sense to sort the list by DATE, but maybe you want to use DT in the format string.
Processing SGF files
To "set up" the processing, you call
gl.start_processing(); // (or gl.start_processing(0) to disable processing of variations)
Then, for each game you want to add to the database, call
gl.process(sgf, path, fn, DBTREE, flags);
where sgf, path and fn are strings (that is char*'s) which contain the content of the SGF file to be processed, the path where the file lives and the file name. DBTREE is a string which is stored in the db and can be accessed via a gisearch - this can be used to organize your database in a tree structure.
You can use the following flags to determine the behavior for the game to be processed:
CHECK_FOR_DUPLICATES = 1; // check for duplicates using the signature CHECK_FOR_DUPLICATES_STRICT = 2; // check for duplicates using the final position OMIT_DUPLICATES = 4; // do not insert duplicates into the db OMIT_GAMES_WITH_SGF_ERRORS = 8; // same for games with SGF errors
The process returns
- 0
- if an SGF error occurred when parsing the "tree structure" (i.e. before parsing the individual nodes), database was not changed
- an integer > 0
- meaning that n games were processed, use process_results to access the individual results
For the processed games (i.e. 0 <= i < n), use process_results(i). It's return value is a combination of the following flags:
UNACCEPTABLE_BOARDSIZE = 1; // (database not changed) SGF_ERROR = 2; // SGF error occurred when playing through the game // (and the rest of the concerning variation was not used). // Depending on OMIT_GAMES_WITH_SGF_ERRORS, everything before this node (and other variations, // if any) was inserted, or the database was not changed. IS_DUPLICATE = 4; NOT_INSERTED_INTO_DB = 8; INDEX_OUT_OF_RANGE = 16;
Finally, to write everything to the database, you must call
gl.finalize_processing();
Duplicates
We distinguished between 'weak' and 'strict' duplicates, the former meaning that the symmetrized Dyer signatures agree, and the second meaning that in addition the final positions coincide.
In addition to taking duplicates into account during the processing, you can also search for all pairs (triples ...) of duplicates in the current GameList. (This refers to the games which are currently in the list, so you may first want to use gl.reset().) Do gl.find_duplicates(bs), where bs is the board size. This returns an integer which is the number of pairs/triples/... of duplicates. You can then retrieve each individual pair/triple/... by gl.retrieve_duplicates_VI(i), (where i is between 0 and the number returned by gl.find_duplicates(bs)-1). This returns a vector of int's. There is a variant, gl.retrieve_duplicates_PI(i), which returns an int* instead (which has to be free'ed by the caller. See cpptest.cpp and testsearch.py for examples.
Game info search
Call the GameList::gisearch(char* sql) method. It takes a string which is inserted as the WHERE clause of an SQL query. If you use GameList::gisearch(char* sql, 1) instead, then you can/should pass the entire sql query string to the method. This can be used to do more complicated queries. More detailed expanation to follow (for now see the examples in testsearch.py).
Signature search
Call the GameList::sigsearch(char* sig, int boardsize ) method, giving the signature you are looking for as the parameter (the signature consists of 12 letters). SQL-wildcards are allowed: _ for a single character, % for an arbitrary number of characters. If boardsize is not 0, then the signature is symmetrized with respect to the board size.
You can retrieve the signature of some game in the current game list with the GameList::getSignature(int i) method, which returns the signature of the i-th game in the current list as a string.
Pattern search
You do a pattern search by calling GameList::search(pattern, searchoptions), where pattern is an instance of the Pattern class, and searchoptions is an instance of SearchOptions.
Creating a pattern
(Create an instance of the Pattern class. There is a little more on this in the SearchAlgorithms page. More to follow.)
Search options
Currently, the following options are supported:
- fixedColor
- Do not search for the given pattern with reversed colors. (Default: 0 which means false)
- nextMove
-
- if 0, consider hits where either player plays next (this is the default)
- if 1, then consider only hits where black plays next
- if 2, then consider only hits where white plays next
This is to be understood relatively to the search pattern, so if nextMove is 1 (and fixedColor is false), then the search will find hits for the original pattern where black plays next, and for the color-reversed pattern where white plays next.
- moveLimit
- An integer. Find only hits which occur before move moveLimit. Default is 10000, which means Find all hits.
- trustHashFull
- Boolean. If true, results from the hashing algorithm for full board positions (if applicable) will not be checked with a second algorithm. As there is the very slight chance of collision of hash codes, there is a small risk that false hits will be delivered. Since I, as a mathematician, am kind of a purist with these things, the default value is false.
- searchInVariations
- Boolean. If true, the variations are also searched for the given pattern (if they have been processed during the creation of the database.)
- algos
- By specifying this you can 'switch off' algorithms which are available for the GameList, but which you do not want to use fo rhtis particular search.
To create an instance of SearchOptions, call one of the constructors:
- SearchOptions() will set all options to their default values. If you want, you can then change them before passing the object along to the search method.
- SearchOptions(int FIXEDCOLOR, int NEXTMOVE, int MOVELIMIT) takes values for the first three options named above. If you want to change the default value of one of the others, you have to do so afterwards.
Accessing the search results
The size() method returns the number of games currently in the list, so after a search it contains the number of games with a match. Use currentEntriesAsStrings(start, end) to get a vector<string> with the entries between start and end (or just currentEntriesAsStrings() to get them all). These strings have the form
gameInfoString + ", " + resultString
for example
dridgway - Haentei (B), /home/ug/go/gtl/reviews/1876-palustris-dridgway-Haentei, 18-1-7-1-8B-,
Here we first have the game information (black player, white player, winner, file name), and then the list of hits. In this case, there is just one hit, namely at "move" 18-1-7-1-8 - this is in a variation, see below how to interpret this - with continuation at position "B". The final "-" indicates, that the search pattern matched with black/white reversed.
There now is also a currentEntryAsString(int i) method which returns just a single string corresponding to the i-th entry in the current list of games.
Hits in games with variations
In the list of hits, hits in variations are shown as "extended move numbers". For example, in a game tree like
(0) - (1) - (2) - (3) - (4) - (5) - (6) - (7) - (8) - (9) - (10) - (11) - ....
|
----(12) - (13) - (14) - (15) - (16) - ...
|
----(17) - (18) - (19) ...
the move number 6-1-3-1-1 would mean: go right 6 times, go down once, go right 3 times, go down once, go right once, and the result would be node (18) in the tree above. Note that going down refers to variations starting at the node before.
Tagging games
It is be possible to have (default and user-defined) categories (or tags). Default categories correspond to properties which can be detected automatically when the game is processed; for instance whether it is a handicap game, or whether the players are professionals (i.e. have a 'p' in their rank). User-defined tags should be arbitrary; the user interface will have to provide a suitable way for assigning such tags.
Methods related to tagging
The GameList class provides the following methods to exclude/include certain folders and games with certain tags from the list.
void tagsearch(int tag) throw(DBError);
void setTag(int tag, int start=0, int end=0) throw(DBError);
void deleteTag(int tag, int i = -1) throw(DBError);
std::vector<int> getTags(int i, int tag=0) throw(DBError); // note the order of arguments!
Automatically assigned tags
- HANDI_TAG (this evaluates the HA tag in the SGF file, and hence the corresponding tagsearch could be replaced by a gisearch (if the HA tag is included in the database)
- PROFESSIONAL_TAG (currently, this just scans whether a 'p' occurs in the rank field in the SGF file, and hence does not work on the GoGoD database)
Player list
The GameList provides a list of all players of games in the database. With plSize() you retrieve the size of this list, and with plEntry(i) (where 0<=i<plSize()) you retrieve the i-th entry. The entries are ordered alphabetically (case-insensitively). This list is independent of the list of "current" games in the GameList.
One thing which may be considered imperfect is that currently games with several players on each side yield entries with these player groups, for instance:
print gl.plSize(), 'players in the whole database.'
for i in range(100,104):
print 'Player %d: %s' % (i, gl.plEntry(i))
yields
3569 players in the whole database. Player 100: Ashida Isoko Player 101: Ashida Isoko & Ishida Yoshio Player 102: Ashida Isoko & Rin Kaiho
Snapshot/Restore
Use GameList::snapshot() to write a "snapshot" of the current game list to the db. This includes information about which games are currently in the game list, the most recent search pattern with its hits etc. This method does not change the database. It returns an integer (a handle) which you can use (later on...) to call GameList::restore(int handle, bool delete). This will restore the state of the game list. The second parameter determines whether the snapshot associated with this handle should be deleted from the db, or not. You can also delete a snapshot without deleting it, by GameList::delete_snapshot(int handle), or delete all snapshots (GameList::delete_all_snapshots()).
See the repository for the complete (and up-to-date) header files of the relevant classes.
