Solving Gogen using an SMT (integer) solver

Continuing the series of solving puzzles using an integer SMT solver we look now at the Gogen puzzle. This popular character placement puzzle is published in the Daily Mail, Daily Telegraph newspapers and other publications. The puzzle offers a grid of 25 interconnected boxes into which all the letters of the alphabet ( except z) must be placed. Some initial clue letters and a list of words is provided to guide placement. Each of the clue words must be traceable on the finished grid moving 1 step in any direction (kings move). No edge wraparound or box hopping is allowed.


The puzzle is simply encoded into text then a script is used to generate the underlying logic of the puzzle. These logic lines are then passed to an SMT solver for analysis and resolution. Microsoft's Z3 and MathSAT solvers are used.

#gogen dm_older02.txt
y.x.q
.....
c.r.k
.....
s.l.n
bijoux
dare
form
germy
oaken
quartile
ribs
romp
strove
witch

The interesting angle on this puzzle was the logic encoding required to enforce the relative positions of the letters. At first glance using 16 variables, one each for the missing letters, would seem reasonable. However this approach would require the pathways between the known and unknown letters as well as the placed positions between the variables to be encoded. 

The main solving clues come from the pairs and triple sequential letters from the provided words. The script splits each clue word into pairs and triples.

Grid: y.x.q ..... c.r.k ..... s.l.n
Words: bijoux, dare, form, germy, oaken, quartile, ribs, romp, strove, witch
Known: 0=y, 1=x, 2=q, 3=c, 4=r, 5=k, 6=s, 7=l, 8=n, 
Pairs:  [ a,k ] [ a,r ] [ b,i ] [ b,s ] [ c,h ] [ d,a ] [ e,n ] 
[ e,r ] [ f,o ] [ g,e ] [ i,b ] [ i,j ] [ i,l ] [ i,t ] [ j,o ] 
[ k,e ] [ l,e ] [ m,p ] [ m,y ] [ o,a ] [ o,m ] [ o,r ] [ o,u ] 
[ o,v ] [ q,u ] [ r,e ] [ r,i ] [ r,m ] [ r,o ] [ r,t ] [ s,t ] 
[ t,c ] [ t,i ] [ t,r ] [ u,a ] [ u,x ] [ v,e ] [ w,i ] 

Triple:   [ a,k,e ] [ a,r,e ] [ a,r,t ] [ b,i,j ] [ d,a,r ] 
[ f,o,r ] [ g,e,r ] [ i,b,s ] [ i,j,o ] [ i,l,e ] [ i,t,c ] 
[ j,o,u ] [ k,e,n ] [ o,a,k ] [ o,m,p ] [ o,r,m ] [ o,u,x ] 
[ o,v,e ] [ q,u,a ] [ r,i,b ] [ r,m,y ] [ r,o,m ] [ r,o,v ] 
[ r,t,i ] [ s,t,r ] [ t,c,h ] [ t,i,l ] [ t,r,o ] [ u,a,r ] 
[ w,i,t ] [ e,r,m ]

Where a triple letter sequence has a known placed letter at each end, the middle letter can be immediately put on the solution grid.  [ r,m,y ] is one such sequence form clue word "germy". Whilst the use of letter grouping is a valid approach such thinking is about how to solve the puzzle when we should be describing the logic of the problem and letting the solvers do the figuring out.

A much simpler approach is used but with a wrinkle. First we declare a variable each for the all the letters to hold an integer value that represents the position on the solution grid.

(set-logic LIA)
(set-option :produce-models true)
(set-option :produce-assignments true)
(declare-const Va Int)
(declare-const Vb Int)
(declare-const Vc Int)
(declare-const Vd Int)
..... up to Vy

Ensure that each letter has it's own unique place on the grid.

(assert (distinct Va Vb Vc Vd Ve Vf Vg Vh Vi Vj Vk Vl Vm Vn Vo Vp Vq Vr Vs Vt Vu Vv Vw Vx Vy ))

Looking closely at how to encode the grid location numbers we can see that by using a discontinuous range can help with the checks for relative locations between the letters. Positions 1..5 for the top row, 11..15 for the next row and so on up to 41..45 for the bottom row. 


The letter proximity tests can be described as "if the absolute difference between the positions of letter A and B are 1, 9, 10, 11 then the letters are on adjacent squares".  As can be seen in the test table where square 23 is compared to each of it's direct neighbours.  Having discontinuous ranges prevents wraparound at the edges which would occur if a continuous position range was used.

Left and right, up and down neighbours are accommodated by calculating position A - position B as well as position B - position A.  The test for the letters "a" and "k" in the clue word oaken becomes 

(assert ( or 
 (= 1 (- Va Vk)) 
 (= 1 (- Vk Va)) 
 (= 9 (- Va Vk)) 
 (= 9 (- Vk Va)) 
 (= 10 (- Va Vk)) 
 (= 10 (- Vk Va)) 
 (= 11 (- Va Vk)) 
 (= 11 (- Vk Va))
)) ; pair a k

with similar lines generated for every letter pair. The triples can be ignored as they duplicate the logic of the letter pairs and would add unnecessary complex constraints.

The known location are set on the placement grid with 

(assert (= Vy 1 ))
(assert (= Vx 3 ))
...
(assert (= Vs 41 ))
(assert (= Vl 43 ))
(assert (= Vn 45 ))

and the ranges for the other position are described using lines, one each for Va .. Vy  as follows 

(assert ( or (and (> Va 0 ) (< Va 6)) (and (> Va 10 ) (< Va 16)) 
 (and (> Va 20 ) (< Va 26)) (and (> Va 30 ) (< Va 36)) 
 (and (> Va 40 ) (< Va 46)) 
))

(assert ( or (and (> Vb 0 ) (< Vb 6)) (and (> Vb 10 ) 
 (< Vb 16)) (and (> Vb 20 ) (< Vb 26)) (and (> Vb 30 ) 
 (< Vb 36)) (and (> Vb 40 ) (< Vb 46)) 
))

(assert ( or (and (> Vc 0 ) (< Vc 6)) (and (> Vc 10 ) (< Vc 16)) 
 (and (> Vc 20 ) (< Vc 26)) (and (> Vc 30 ) (< Vc 36)) 
 (and (> Vc 40 ) (< Vc 46)) 
))
.......

Finally the model is closed out and answers extracted with 
(check-sat)
(get-model)
(exit)

Making a total of 104 lines. We could have shortened the logic lines generated by removing the position and range assertions for the known places at the expense of script complexity.

When presented to the solver the answers are quickly generated as

$ ./gogen_smt2.pl < gogen_dm_older_12.txt | time ../solver_m5
sat
( (Va 14)
  (Vb 42)
  (Vc 21)
  (Vd 15)
  (Ve 34)
  (Vf 2)
 ......... snip
  (Vs 41)
  (Vt 32)
  (Vu 4)
  (Vv 24)
  (Vw 44)
  (Vx 3)
  (Vy 1) )
        0.15 real         0.12 user         0.00 sys

that can then be presented in a results table with the results checked against the original words.

$ ./gogen_smt2.pl < gogen_dm_older_12.txt | time ../solver_m5 | ./gogen_smt2.pl -f gogen_dm_older_12.txt > xx.html



Changing one of the result lines using the stream editor sed shows that the word checking is functional.

$  ./gogen_smt2.pl < gogen_dm_older_12.txt | ../solver_m5 |  sed 's/Vq 5/Vq 8/' | ./gogen_smt2.pl -f gogen_dm_older_12.txt | grep OK

Words: bijoux, dare, form, germy, oaken, quartile, ribs, romp, strove, witchbijoux OK, dare OK, form OK, germy OK, oaken OK, quartile **Fail**, ribs OK, romp OK, strove OK, witch OK, 

By turning the encoding on it's head solving for an integer letter position rather than fixing the positions and trying to solve which letter is at that position the overall logic is simplified. A valuable key difference can be seen between SMT solvers that use logic to derive integer values compared to SAT solvers that purely resolve set of boolean logic constraints.

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Here is another gogen puzzle.
    Hopefully I won't move this one:
    https://www.patreon.com/posts/42408618

    ReplyDelete

SMT Solvers, introduction and links (Start here with the readme)

Picat - First steps

 Getting started with a new software system can always be a little daunting. Just a little bit of help to get over the curb can make that jo...