Current status

Part of libkombilo works already. I will just give you some Python code here to give you an impression of what you can do. Below there is a list of the major things that are still missing.

Features

  • Search for corner patterns, full board patterns, and patterns anywhere on the board, of course taking into account symmetries (rotation, mirroring), and -unless switched of- color reversal.
  • Works for any (square) board size.
  • Can search for continuations, i.e. you give an initial pattern (possibly empty), and then a sequence of moves which have to occur in every hit in the given order. [This needs more testing.]
  • Handles games with variations, and find results within variations as well. [This needs more testing.]

Examples: what you can do right now

Note: If you actually want to play around with libkombilo, look at the example .py files in the repository. Those are sometimes more up-to-date than the examples below.

First of course your sgf files have to be "processed" into a database with which the algorithms can work. I decided to use SQLite as the underlying SQL database engine. As the name indicates, it is a light SQL database. It is very fast, as well, and easy to set up. This also provides us with very easy and quick game info search, see below.

The processing is done as follows:

# First do some imports. The important GameList class which will serve to process the SGF files
# is contained in the sgf library (this should be renamed libkombilo).

import os, os.path, sys, glob, time
from libkombilo import *

# Get a list of all sgf files we want to process. In the example, we use the GoGoD database

filenames = glob.glob('/home/ug/go/gogod06/*/*.sgf')
filenames.sort()

# Create a GameList instance. This will contain the list of games, and will later be
# used to do the searches. Set up the GameList for processing. t1.db is the file which
# we use for storing all the database tables.

gl = GameList('t1.db')
gl.start_processing()

# Now we go through the list

for filename in filenames:
    file = open(filename)
    sgf = file.read()           # store the content of the current file
    file.close()
    path, fn = os.path.split(filename)
    
    gl.process(sgf, path, fn)   # gl.process parses the game, stores the relevant game information in
                                # the database and gives all the relevant information
                                # to the single algorithm classes, which store this information in their
                                # own database tables

gl.finalize_processing()   # commit everything to the database

print 'Processed %d games in %.2f seconds' % (len(filenames), time.time()-starttime)

It takes about 70 seconds to process the 40228 games from the GoGoD database for ALGO_FINALPOS and ALGO_MOVELIST (on my Laptop with a Pentium M 1.5 GHz, when the sgf files are already cached, so the time to read them from disk has to be added, but of course that does not take too long). If the hashing algorithm is added, this increases considerably (to around 15 minutes), but I think it will be possible to optimize this further. In any case, this is a huge improvement over the processing in Kombilo 0.5. Of course, this just comes from rewriting the Python code in C++.

Now how can we use the library to do a search? Let's see:

# Some imports

import os, sys, time
from libkombilo import *

# Set up the game list, and print the number of games

gl = GameList('t1.db')
 
print gl.size(), 'games in the database.'

# Let's search for all games where Hane Naoki is white

gl.gisearch("pw = 'Hane Naoki'")
print gl.size(), 'games in the database.'

# Now let us set up a search pattern.

p = Pattern(CENTER_PATTERN                 # where we want to search for the pattern,
                                           # in this case the whole board, except for the corners/edges
            19,                            # board size
            5, 4,                          # the size of the pattern
            '..XOO' +                      # the pattern itself 
            '...XX' +
            '.....' +
            '..X..') 

# Now we search for the pattern. The search applies (only) to the games currently in the game list,
# so in our case the games with Hane Naoki as white.

start = time.time()
gl.search(p, SearchOptions())
end = time.time()
print '\n'.join(gl.currentEntriesAsStrings())
print 'This search took %.2f seconds.' % (end - start)

# Let's look for another pattern in the remaining games

p = Pattern(CENTER_PATTERN, 2, 2, 'XO' + 'OX')
start = time.time()
gl.search(p, SearchOptions())
end = time.time()
print '\n'.join(gl.currentEntriesAsStrings())
print 'This search took %.2f seconds.' % (end - start)

Here is the output of this: (this is not quite up to date, but you get the idea)

num of games: 40228, num of hits: 0

num of games: 171, num of hits: 0

29693, Hane Naoki - Hikosaka Naoto, 1999-08-05d.sgf, 57
34934, Hane Naoki - Yamashita Keigo, 2003-11-27b.sgf, 69
37693, Hane Naoki - Kang Tong-yun, 2005-10-11d.sgf, 42
num of games: 3, num of hits: 3
This search took 0.04 seconds.

29693, Hane Naoki - Hikosaka Naoto, 1999-08-05d.sgf, 84
34934, Hane Naoki - Yamashita Keigo, 2003-11-27b.sgf, 118
37693, Hane Naoki - Kang Tong-yun, 2005-10-11d.sgf, 142
num of games: 3, num of hits: 19
This search took 0.00 seconds.

Still missing

Among others, the following are still missing (feel free to add points to this list ...)

  • The API is not yet stable
  • Hash algorithms are not yet complete (but hashing for full board and for corner positions is done)